This was a common approach called a "honeypot". As I recall, bots eventually overcame this approach by evaluating visibility of elements and only filling out visible elements. We then started ensuring the element was technically visible (i.e. not `display: none` or `visibility: hidden`) and instead absolutely positioning elements to be off screen. Then the bots started evaluating for that as well. They also got better at reading the text for each input.
Interestingly, the award was specifically for the impact of AlphaFold2 that won CASP 14 in 2020 using their EvoFormer architecture evolved from the Transformer, and not for AlphaFold that won CASP 13 in 2018 with a collection of ML models each separately trained, and which despite winning, performed at a much lower level than AlphaFold2 would perform two years later.
I've gotten so much push-back over the years when I've asserted that startups are too quick to describe their products as AI in industries where the target customer considers it undesirable. I think it's done for ego and to impress investors. Examples I've seen include products in clinical diagnosis and financial accounting. Some needs require utmost predictability, observability, and ultimately, understandability.
Of course, there are some industries and markets that desire the capabilities only AI can provide. But that's the point. Analysis should precede the message. We should market the benefits. I've seen a few people at least claim AI isn't a benefit, it's a feature. I'd argue it's not even a feature, it's an implementation detail, like using an object-oriented programming language or a relational database; it has advantages and disadvantages.
Focus on the needs of the customer and industry. Describe the benefits. For customers and investors alike, remove the AI veil of opacity, by describing simply what the AI is doing and how/why.
It's interesting to see a study that seems to corroborate my anecdotal experiences. It's a marketing study though, so it shouldn't be overly generalized until more studies reproduce the results. Studies about human behavior tend to be difficult to reproduce and can yield conflicting conclusions. I wouldn't be surprised to see another study with slightly different questions or methods come to the opposite conclusion, especially if they don't control for consumer segments, industries, or types of products.
It's a little worse than that. If you need to upgrade at least once per billing cycle, the system keeps charging you more and more (since an upgrade will always have a higher total cost than the previous upgrade) each cycle while accruing credits that will never start paying down unless/until you have a billing cycle without an upgrade. That's not a great design for most systems, but especially bad for a scalable image service intended to facilitate the flexibility of growth.
> And it turns out that a lot of human reasoning is statistically predictable enough in writing that you can actually obtain reasoning-like behavior just by having a good auto-complete model.
I would disagree with this on a technicality that changes the conclusion. It's not that human reasoning is statistically predictable (though it may be), it's that all of the writing that has ever described human reasoning on an unimaginable number of topics is statistically summarizable, and therefore having a good auto-complete model does a good job of describing human reasoning that has been previously described at least combinatorially across various sources.
We don't have direct access to anyone else's reasoning. We infer their reasoning by seeing/hearing it described, then we fill in the blanks with our own reasoning-to-description experiences. When we see a model that's great at mimicking descriptions of reasoning, it triggers the same inferences, and we conclude similar reasoning must be going on under the hood. It's like the ELIZA Effect on steroids.
It might be the case that neural networks could theoretically, eventually reproduce the same kind of thinking we experience. But I think it's highly unlikely it'd be a single neural network trained on language, especially given the myriad studies showing the logic and reasoning capabilities of humans that are distinct from language. It'd probably be a large number of separate models trained on different domains that come together. At that point though, there are several domains that would be much more efficiently represented with something other than a neural network model, such as the modeling of physics and mathematics with equations (just because we're able to learn them with neurons in our brains doesn't mean that's the most efficient way to learn or remember them).
While a "sufficiently huge autocomplete model" is impressive and can do many things related to language, I think it's inaccurate to claim they develop reasoning capabilities. I think of transformer-based neural networks as giant compression algorithms. They're super lossy compression algorithms with super high compression ratios, which allows them to take in more information than any other models we've developed. They work well, because they have the unique ability to determine the least relevant information to lose. The auto-complete part is then using the compressed information in the form of the trained model to decompress prompts with astounding capability. We do similar things in our brains, but again, it's not entirely tied to language; that's just one of many tools we use.
> We don't have direct access to anyone else's reasoning. We infer their reasoning by seeing/hearing it described, then we fill in the blanks with our own reasoning-to-description experiences. When we see a model that's great at mimicking descriptions of reasoning, it triggers the same inferences, and we conclude similar reasoning must be going on under the hood. It's like the ELIZA Effect on steroids.
I don't think we know enough of how these things work yet to conclude that they are definitely not "reasoning" in at least a limited subset of cases, in the broadest sense wherein ELIZA is also "reasoning" becuase it's following a sequence of logical steps to produce a conclusion.
Again, that's the point of TFA: something in the linear algebra stew does seem to produce reasoning-like behavior, and we want to learn more about it.
What is reasoning if not the ability to assess "if this" and conclude "then that"? If you can do it with logic gates, who's to say you can't do it with transformers or one of the newer SSMs? And who's to say it can't be learned from data?
In some sense, ELIZA was reasoning... but only within a very limited domain. And it couldn't learn anything new.
> It might be the case that neural networks could theoretically, eventually reproduce the same kind of thinking we experience. But I think it's highly unlikely it'd be a single neural network trained on language, especially given the myriad studies showing the logic and reasoning capabilities of humans that are distinct from language. It'd probably be a large number of separate models trained on different domains that come together.
Right, I think we agree here. It seems like we're hitting the top of an S-curve when it comes to how much information the transformer architecture can extract from human-generated text. To progress further, we will need different inputs and different architectures / system designs, e.g. something that has multiple layers of short- and medium-term working memory, the ability to update and learn over time, etc.
My main point is that while yes, it's "just" super-autocomplete, we should consider it within the realm of possibility that some limited form of reasoning might actually be part of the emergent behavior of such an autocomplete system. This is not AGI, but it's both suggestive and tantalizing. It is far from trivial, and greatly exceeds what anyone expected should be possible just 2 years ago. If nothing else, I think it tells us that maybe we do not understand the nature of human rationality as well as we thought we did.
> What is reasoning if not the ability to assess "if this" and conclude "then that"?
A lot of things. There are entire fields of study which seek to define reasoning, breaking it down into areas that include logic and inference, problem solving, creative thinking, etc.
> If you can do it with logic gates, who's to say you can't do it with transformers or one of the newer SSMs? And who's to say it can't be learned from data?
I'm not saying you can't do it with transformers. But what's the basis of the belief that it can be done with a single transformer model, and one trained on language specifically?
More specifically, the papers I've read so far that investigate the reasoning capabilities of neural network models (not just LLMs) seem to indicate that they're capable of emergent reasoning about the rules governing their input data. For example, being able to reverse-engineer equations (and not just approximations of them) from input/output pairs. Extending these studies would indicate that large language models are able to emergently learn the rules governing language, not necessarily much beyond that.
It makes me think of two anecdotes:
1. How many times have you heard someone say, "I'm a visual learner"? They've figured out for themselves that language isn't necessarily the best way for them to learn concepts to inform their reasoning. Indeed there are many concepts for which language is entirely inefficient, if not insufficient, to convey. The world's shortest published research paper is proof of this: https://paperpile.com/blog/shortest-papers/.
2. When I studied in school, I noticed that for many subjects and tests, sufficient rote memorization became indistinguishable from actual understanding. Conversely, better understanding of underlying principles often reduced the need for rote memorization. Taken to the extreme, there are many domains for which sufficient memorization makes actual understanding and reasoning unnecessary.
Perhaps the debate on whether LLMs can reason is a red herring, given that their ability to memorize surpasses any human by many orders of magnitude. Perhaps this is why they seem able to reason, especially given that our only indication so far is the language they output. The most useful use-cases are typically those which are used to trigger our own reasoning more efficiently, rather than relying on theirs (which may not exist).
I think the impressiveness of their capabilities is precisely what makes exaggeration unnecessary.
Saying LLMs develop emergent logic and reasoning, I think, is a stretch. Saying it's "within the realm of possibility that some limited form of reasoning might actually be part of the emergent behavior" sounds more realistic to me, though rightly less sensational.
EDIT:
I also think it's fair to say that the ELIZA program had the limited amount of reason that was programmed into it. However, the point of the ELIZA study was that it shows people's tendency to overestimate the amount of reasoning happening, based on their own inferences. This is significant, because this causes us to overestimate the generalizability of the program, which can lead to unintended consequences when reliance increases.
Do you mean for AI to do the entire job of researching, creating, testing, manufacturing, and distributing a cure, or just for AI to be involved? And do you mean completely eradicating a disease, or just producing a cure for it? And do you mean an outright cure, or also a treatment or vaccine? If the latter in all cases, here's an example:
First, I agree it's important to put Tesla recalls in context with the greater automotive industry, so I think what you posted is great info. The only thing I'd disagree with is that the numbers compared to Tesla, and relative media coverage, is surprising. It seems expected to me.
Tesla bills itself not as an automaker, but a tech company. So, it makes sense they'd have a larger media footprint, which covers not just the automotive industry, but the tech industry media as well. This isn't unfair, considering they get the benefit of a tech-based market cap to go with it [1].
They also put themselves in headlines more often than other car companies with outlandish claims such as Musk saying, "At this point, I think I know more about manufacturing than anyone currently alive on earth." [2] When Musk and the company put themselves in headlines so often, it makes sense that the media would cover them more. This is likely a direct result of their advertising strategy, to create buzz [3], so I think media coverage of failures is a direct result.
You could argue that's Musk, not the company, but they made the strategic decision that Musk is their PR function when they became the only car company to dissolve their PR department in 2020 [4].
One last thing I noticed was that the source of the recall data comes from the NHTSA [5], and they don't seem to distinguish recalls between different brands owned by the same company (for example, Ford's recalls seem like they would include both Ford and Lincoln, GM includes Chevrolet, GMC, etc.) Tesla's 20 recalls in 2022 cover I believe the four models they made in 2022, while Ford's 67 recalls are across the 39 models under the Ford brand and five models under Lincoln (I counted these by looking at the drop-down selectors on KBB's value estimator [6]).
In short, Tesla exploits the hype machine; is it surprising that their recalls are hyped as well?
I did something similar to this a while back with a one-liner aliased in my Bash includes, called gitsum (short for git summary).
alias gitsum='git log --pretty=format:"* %s" --author `git config user.email`' #myself
It gives my git commit messages as a Markdown bullet-point list. It only works per-branch unlike the linked gist, but one cool thing about it is that you can tack on additional git flags, such as --since. For example:
I think my favorite part of this prompt is that it starts with, "Please..."
With this new class of products based on crafting prompts that best exploit a GPT's algorithm and training data, are we going to start seeing pull requests that tweak individual parts or words of the prompt. I'm also curious how the test suite for projects like this would look for specific facts or phrases to be contained in the responses for specific inputs.
A lot of good feedback here. One additional thought I'd provide is that I wouldn't rule someone out based on a "Contact us for price" button. I'd click the button, let them know what you need, and then let them rule themselves out based on their responses. Sometimes they may give you a price directly in their response (or at least a ballpark), sometimes they may insist on a call. Take the call, but draw the line where you're comfortable (e.g. give me a ballpark after 30 minutes or I'm out - you can be more tactful in your wording).
Think about it from the company's perspective. Let's imagine as a company, you do an experiment where you have one landing page that shows the pricing, and another landing page that says "contact us". Let's also imagine that your product is enough of an enterprise solution that no one ever buys it without talking to someone at some point during the evaluation process (even if the pricing is up front, it requires enough of an investment that the customer wants to be absolutely certain it will satisfy their needs both now and as they grow). Finally, let's imagine that you have 3 full-time sales people to handle the incoming communications at whatever step of the evaluation process.
Now, if the outcome of this experiment is that the "contact us for pricing" results in 1/5th the incoming contacts, but those then convert twice the rate, you might choose the up-front pricing (2x conversion rate but on 20% of the leads means only 40% the sales compared to up-front pricing).
However, what if the 5x incoming contacts is too many for your sales team of 3 to respond, causing the up-front pricing to result in fewer sales with more work? Then you might choose the "contact us for pricing".
But what if it's so many more incoming contacts at a consistent enough conversion rate that you can justify adding a 4th and 5th sales person to realize those sales? Then you might choose the up-front pricing.
But what if the up-front pricing pigeon-holes you into your beachhead market and makes it more difficult to expand vertically or horizontally, which could lead to a trail-off in the incoming contacts and sales? Then you might choose the "contact us for pricing".
The point is, the one that makes the most sense for a company depends on a lot of variables, most of which would be opaque to an outside observer. Being able to tell the difference between a company that has "contact us for pricing" because it made sense for them, compared to one trying to exploit price discrimination (as described by some of the more cynical takes) is next to impossible without talking with them. Even in that scenario though, if they end up giving you a better product for a better price, then being willing to reach out to them could end up being a competitive advantage for your company over another which disqualified them on that basis.