Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
The makers of Eleuther hope it will be an open source alternative to GPT-3 (wired.com)
145 points by webmaven on March 29, 2021 | hide | past | favorite | 95 comments


As someone who works on a Python library solely devoted to making AI text generation more accessible to the normal person (https://github.com/minimaxir/aitextgen ) I think the headline is misleading.

Although the article focuses on the release of GPT-Neo, even GPT-2 released in 2019 was good at generating text, it just spat out a lot of garbage requiring curation, which GPT-3/GPT-Neo still requires albeit with a better signal-to-noise ratio. Most GPT-3 demos on social media are survivorship bias. (in fact OpenAI's rules for the GPT-3 API strongly encourage curating such output)

GPT-Neo, meanwhile, is such a big model that it requires a bit of data engineering work to get operating and generating text (see the README: https://github.com/EleutherAI/gpt-neo ), and it's unclear currently if it's as good as GPT-3, even when comparing models apples-to-apples (i.e. the 2.7B GPT-Neo with the "ada" GPT-3 via OpenAI's API).

That said, Hugging Face is adding support for GPT-Neo to Transformers (https://github.com/huggingface/transformers/pull/10848 ) which will help make playing with the model easier, and I'll add support to aitextgen if it pans out.


Totally off topic: can you fix the pip3 installer for aitextgen? I just filed an issue on GH issue tracker.


I don't know when it started (as it's been years since I wrote anything in Python) but the last couple of years I've been seeing way way more of these generic Python environment / configuration errors around the internet that are hard to diagnose and debug. Has something happened over the last few years with Python and its configuration and dependency management?


It's more of a Homebrew issue, which is likely what the OP hit. (incidentally I just hit a similar-but-unrelated Homebrew Python environment issue)


People already believe garbage at a pretty alarming rate. It's easy to guess at a number of possible outcomes here:

- More junk text moves the public to doubt legitimate information even further than they currently do.

- There is so much human-generated junk text that adding more of it via AI actually doesn't have much of an effect.

- People return to lean on experts, perhaps even more than before. (just as a number of tech-literate folks have now returned to relying on brand name.)

Speculation is easy of course, so who knows what will actually happen.


True. But a simple API to generate junk text? It can scale, cheaply, beyond measure.

No need for a troll farm, hiring, managing and training tens or hundreds of people.

A reasonable amount of cash, a bit of motivation, some moderate technical skills, and voilà! Anyone can compete with the Russian troll farms now and build their own networks of hundreds or hundreds of thousands sufficiently credible (as humans) fake accounts spewing garbage and patting each other on the back via likes, retweets and whatnots.

All with the appropriate fake news blogs and sites happily churning out grammatically correct nonsense that makes (enough) sense.

Basically, this kid’s dream: https://www.nbcnews.com/news/world/fake-news-how-partying-ma...


But even Russian/Chinese bots can step up as it is much easier now to flood forums (like reddit) etc, where e.g. a China critic article appears to kill any discussion.


I find it easier to identify humans that flood forums though. Especially non-native speakers usually are somewhat easy to spot, I assume that's true in any language. That's different for ML-generated texts. On the other hand, human texts are more "on message", but if all you want to do is create noise, I guess you don't need to have targeted communications.


> if all you want to do is create noise, I guess you don't need to have targeted communications.

This is key in anti-extremist operations on anonymous boards. 4chan and other similar sites are absolutely nothing like they were a decade ago, I presume because of such bots flooding them with noise.


Oh, definitely. Existing operations also absolutely can leverage that in order to amplify their reach and capabilities.


And unfortunatley, the web will be forced to move toward verified human identities to fight with such junk and anonymous browsing will become a thing of the history.


That wouldn't work. You can already rent humans and sim cards to do captchas and phone verification. There's nothing stopping someone from verifying their bots.


It might be neither easy nor impossible. You could imagine s reputation system that depended on what others thought of you. Start the network with just people the developers know personally. Admit new people when enough existing members couch for them. Sometimes fake cancers will appear, but they'll sometimes get sniffed out, and the people who let them in punished somehow (temporarily blocked? Delete photos at random? Idk) strictly enough that people generally don't want to let in bad actors.


I think we may come to see the era of roughly 1990-2010 as the golden age of information: relative abundance creating new opportunity, before the noise drowned it all out.

I suspect that in the future people will, ironically, return more strictly to tribal knowledge, as the media and the internet will be (already is) a vast ocean from which you can pull anything you want to believe. Thus nothing you see or hear from mass media or the internet can be trusted, there are no experts, and you go back to information scarcity as you have to rely on your immediate human network for trust. Actually I think we’re already seeing the return to tribal authority, the early waves are already here on Facebook and YouTube... they just haven’t devolved to strictly local circles of trust yet.


neal stephenson has a cool prediction on how to handle this in one of his latest books: curators. people of means will have a curated view of the internet and information that costs $$$


In many ways, predicting the future is just about looking at how to apply past trends to our modern world. People and organizations with the resources to do so are actively working to cram their "curation" into the public sphere, to try and sway the masses. That's advertising. It's the same thing it was in the 20th century, just more personal and using more sinister means of hogging your attention span. Peoples' intuition about "advertising" is as a commercial enterprise, but I'm certain all the textbooks that talk about the 2010s and 2020s will mention the role of mass bot/shill campaigns to sway the public by governments, companies, organizations, individual people.


It’s already here. How many people are paying for newsletter subscriptions etc, and how quickly is that growing? :)


There is a market (well, a need at least) for nonsense detectors that work similarly to the way ad blockers work. Detect internal inconsistencies, non-sequiturs, low information density and other similar reasons to avoid reading the text -- and visibly flag or block that.

That should eliminate 80%+ of existing human generated text content and lead to text generators composing useful articles.


> just as a number of tech-literate folks have now returned to relying on brand name

out of curiosity, what are you referring to?


In the early-ish days of the consumer internet, consumers had a new and huge information advantage over companies. People moved from relying on brand name, to reading online reviews. Often finding niche brands which they had otherwise not heard of.

Now, in 2021, that experience is flipped on its head. Amazon reviews are gamed and cannot be trusted. Companies build niche brands like fly-by-night companies, and the lesser known brands have a very high chance of being both seriously inferior, and also short lived.

At least, this has been my experience, and the experience of some others.

[edit]

And as further anecdotal proof that things have come full circle, my elderly mother in law keeps getting tricked by Amazon purchases. "The reviews were good," she'll say before returning something.


I just rely on reviewers like NYT Wirecutter and then buy whatever the reviewers suggest (and is cheap) on Amazon.


Why do you imagine the NYT isn't taking a bigger paycheck for saying the same thing the fake Amazon reviews are saying? We live in a world where brands are trash. They're built by the lowest bidder with the absolute cheapest parts ("the Toyota Way", heavy airquotes), and packaged with lies (marketing). Where did this idea come from that it's only the foreign resellers of Chinese products that are posting fake Amazon reviews, when it is absolutely the same case with household brand name companies? Go to Fakespot and look up random brand name companies you would normally "trust" and see what how Fakespot rates the company. We are WAY past the point where the vast majority of companies (governments, organizations, etc) aren't using these types of mass propaganda campaigns. A brand name in the general case is no more trustworthy than some imitator.


NYT's entire value proposition is that you can trust them. Amazon's is (mostly) that they'll get you what you asked for, fast and cheap. A bad Amazon review seems to redound, for whatever reason, mostly to the discredit of the particular seller, not Amazon. NYT therefore has more incentive to give you accurate reviews.


I am not inspired to trust Fakespot given their privacy policy.

"When you use our Services, we automatically collect the following information about you (collectively referred to as “Personal Information”):

    Your device information which includes, but is not limited to, information about your web browser, IP address, time zone, and some of the cookies that are installed on your device.
    Individual web pages or products that you view, what websites or search terms referred you to the Service, and information about how you interact with the Service.
    Your first and last name
    Your email address
    Your username associated with your Apple ID or Google Account  
    Your account information such as your account name, account password, other credentials, security questions, and confirmation codes"


That's fine, but it doesn't answer the question of why you would trust a NYT product "review": https://www.nytimes.com/privacy/privacy-policy


> People return to lean on experts

The problem with this is that people look at anybody confirming their bias as an expert. I can't tell you how many FB posts I've seen where some armchair poster claims that a researcher is wrong because of xyz and it's being reposted thousands of times.


People believe what's believable (even if backed up by garbage). GPTs dont make believable stuff, but they can be used to flower up some b.s. idea. Nothing that can't be done with a few hired trolls, and the proliferation of garbage will endanger the troll industry, as people will start becoming suspicious. So i doubt its impact can go beyond generating spam and noise.


Concrete prediction: There will be a global cult similar in nature to Qanon driven by an AI spitting out generated bullshit within the next ten years.

That's assuming some percentage of Qanon word salad isn't the output of Markov chain generators. A lot of it resembles low-order statistical text generator output after having been trained on a corpus of 1990s Usenet alt.conspiracy and the Protocols of the Elders of Zion.


I don't know why the eleuther project riles me up so much. Their work on the pile gets to me because they're so cavalier about copyright (while I defend myself by training on similarly pirated text datasets, but feel different because I don't redistribute them and am honest that it's pirated. to be clear, i'm rolling my eyes at my rationalization right here). Their work on gpt-neo riles me up because they do such a weak job comparing it to the models whose hype they're riding. It also riles me up because so many people just eat it up uncritically.

But it's all out of proportion. I think it's that last part (the uncritical reaction) that makes me blow this out of proportion.


> Their work on GPT-Neo rules me up because they do such a weak job comparing it to the models whose hype they’re riding.

Building open source infrastructure is hard. There does not currently exist a comprehensive open source framework for evaluating language models. We are currently working on building one (https://github.com/EleutherAI/lm-evaluation-harness) and are excited to share results when we have the harness built.

If you don’t think the model works, you are welcome to not use it and you are welcome to produce evaluations showing that it doesn’t work. We would happily advertise your eval results side by side with our own.

I am curious where you think we are riding the hype /to/ so to speak. The attention we’ve gotten in the last two weeks has actually been a net negative from a productivity POV, as it’s diverted energy away from our larger modeling work towards bug fixes and usability improvements. We are a dozen or so people hanging out in a discord channel and coding stuff in our free time, so it’s not like we are making money or anything based on this either.


Hi! I’m the EAI person who your criticism of the Pile is most directed at. I’m curious if you read Sections 6.5 and 7 of the Pile working paper and, if so, what your response to it is. As you note, virtually everyone trains on copyright data and just ignores any implications of that fact. I feel that our paper is very upfront about this though, going as far as to have a table that explicitly lists which subsets contain copyrighted text.

Also, I realize that you don’t have any ways of knowing this but we also have separated out the subset of the Pile that we can confirm is licensed CC-BY-SA or more leniently. This wasn’t done in time for the preprint, but is in the (currently under review) peer reviewed publication. Unfortunately the conference rules forbid you from posting materials or updating preprints between Jan 1st 2021 and the final decision announcement. But we will be making the license-compliant subset of the Pile public when we are able to and will give it equal prominence on our website to the “full” Pile.

Also, we will be releasing a datasheet for the dataset but again conference limitations prevent us from doing so yet.

If you’re interested in talking about this in depth, feel free to send me an email.


Hi again! We had a back-and-forth about this a while back regarding the paper and I think we didn't end up on the same page regarding the "public data" definition in the paper (found it! [0]). I love that you're upfront in the paper, because it's silly how most people just don't acknowledge it (though they usually don't redistribute it publicly like the pile does).

I think the gist was us disagreeing about the relevance of

> Public data is data which is freely and readily available on the internet. This primarily excludes ... and data which cannot be easily obtained but can be obtained, e.g. through a torrent or on the dark web.

That last phrase is what got to me. It puts things in the same category that feel too different. E.g. the harry potter books in vs this comment I'm writing. They're both available within a few clicks from the search bar (one because I put it there, another because it was put up against the wishes of the author and owners), but that commonality doesn't feel relevant.

Excluding torrents especially seems like a cop out explicitly to get around the issue of "X is the top result when i google it" being so common as a torrent. I think you're trying to exclude that content as public because then it defines too much as public? But torrent vs ftp doesn't feel at all relevant when it's just google plus a click or three. Or searching on pirate bay plus a single click.

I imagine a judge looking at the copyright status of someone's pirate site and saying they can't redistribute the content, and the pirate responding "okay we'll take down the ftp server and put up a torrent instead, so that it's not public. If you google us (or search on pirate bay), the top result will stop saying 'X download' and now it'll say 'X download torrent'" and expecting the law to be on their side.

I didn't really buy the arguments in section 7 either. The usage points seem legitimate, but don't cover redistribution.

> But we will be making the license-compliant subset of the Pile public when we are able to and will give it equal prominence on our website to the “full” Pile.

This is fantastic and I want to sincerely thank you for that.

I'm trying not to be combative, but I feel like publicly redistributing other people's work does raise the bar quite a lot higher than just using it to train.

[0] https://news.ycombinator.com/item?id=25616218


I don't have a dog in this fight, but I think you should re-read this: data which cannot be easily obtained but can be obtained, e.g. through a torrent or on the dark web.

It's an extra piece of engineering to reliably scrape torrents and the dark web and exclude spam traps. "Easily obtained" is probably as much about this vs the copyright aspects.

The person you are replying to is correct in saying that most people train on the "public web" (eg, common crawl data). The copyright implications of this haven't been tested in court as yet.

It is worth noting that common-crawl data is widely distributed and would seem to raise the same issues you are identifying here.


Why would it matter? Legally even? Once you have the pirated dataset you're merely letting the program analyze it, not copying it. The resulting network isn't a transformation of the copyrighted work, by the pigeon-hole principle. It's like reporting on the spelling of the corpus, the results aren't tainted by the legality of the access.

Also, what kind of joke would it be if we could only train AIs on text we were allowed to use? That much bias would make the result worthless at predicting the real world.


How many internet forum prophecy cults (you know like the q one) are or will be powered by these language models? It's often assumed or at least easier to imagine the evaluator in Turing's test is a rational actor that possesses a high-degree of skepticism. But it seems that a lot of the human population is ready and willing to believe wild claims with little or no evidence and many people seek out information that confirms what they already believe.

As the cost of making such models becomes less and less, it seems inevitable, spin up many such models and see what sticks and/or combine some evolutionary process for feeding back user-engagement to fine-tune and adapt the models. How many of these influence machines will latch onto the language of existing religious traditions and how many might invent or spur on the development of entirely new ones? Maybe not exactly the "Age of Spiritual Machines" that some futurists predicted...

How far are we from "Show HN: I started a cult by training a model on the sermons of televangelists and MLM copy."


This thought process is something I think is a common misconception with how cults work.

A machine to autogenerate cult-ish nonsense isn't needed. Humans are already incredibly good at doing this on their own.

Not only this, but another thing about this is that cults generally fine-tune themselves to fit their members.

A machine generating convincing lies still wouldn't meaningfully do as much as a human-operated, human-targeted attempt at a cult. Creating one is something basically any human can do; the required skillset is something most people possess.


"Show HN: How I increased ROI of my cult by 20% with AI-driven personalized advertisements."

Imagine GAN generated pictures showing attractive people (optimized for your taste) attending imaginary member's meetings near you. It would basically be the same as the text generation SPAM that eBay and AirBnb already use, but with pictures.


It'll be interesting to see how colleges and universities react to GPT-3. Students will surely use it to write entire assignments.


Have you read much gpt3 stuff? While it's coherent in a sentence it is very rambling over paragraphs to pages. It could probably do fine for a grade school or bad highschool paper. I think if you turned it in for college you'd get an f.

On an unrelated note my fake daughter is now a TA and the professor lead off saying "we are in a golden age of cheating". They're going for way more short assignments as it's a lot more work to cheat on those than one make or break test.


Have you read college freshman essays? While it's coherent in a sentence it is very rambling over paragraphs to pages.


yes I've read them, would those pass an English composition class in college? This comment generated by Gpt3



I think GPT essays could definitely pass a freshman expository writing class. I went to a pretty good university and when we did peer review I was pretty surprised at (what I considered) the low average quality of the writing.


Examples?


Looking back, I think freshman me was perhaps a bit harsh in my assessment of my peers. Here are two excerpts, one my own writing and one from a peer. Rereading them, I am not sure GPT3 could recreate either of these, but you can judge:

Peer: > The Gaza Conflict Gave Hamas what they needed to build an even deeper anti US narrative and anti-israeli narrative. The reasons that Israel was able to act the way it did so during these conflicts were:The civil war/Russian invasion of Ukraine, especially after the July 17 downing of Malaysian Airlines Flight 17; President Assad’s brutal tactics in the Syrian civil war, which seemed to be paying off at the time; and advance of the Islamic State in northwest Iraq and the horrific videos of their executions. As a way to show 0 tolerance for Islamic Radicalism President Obama and his administration gave Israel full support at the start of the Gaza operation.

Me: > Eleven years later, peace in Israel and Palestine seems no closer. Despite being ousted by Fatah from the West Bank, continually targeted by economic sanctions, and subject to military action, Hamas seems no closer to disappearing than when it was first founded (Milton-Edwards 212) and continues to be a major political force, particularly in the Gaza strip. What has changed, however, is that Hamas has grown desperate. In “A rare opportunity to influence Hamas,” Daniel Nisman argues that the increased isolation experienced by Hamas due to the international community's pressure is in fact counterproductive.


Here's some samples using the small GPT-2 (emphasis: small GPT-2!) in AiTextGen, seeded using "The Gaza Conflict Gave Hamas what they needed " (1st 3 samples, not cherry picked):

> The Gaza Conflict Gave Hamas what they needed to survive and that's not something Israel can afford.

> The Gaza Conflict Gave Hamas what they needed to take control of Gaza. In the past few days I have had a few conversations with people who have been in Gaza who are very critical of the IDF. These people are often very, very angry. They are also very angry about the way the IDF is doing what it is supposed to be doing and what it is supposed to be doing. So in my mind, the issue of Hamas being in control of Gaza is not something that should go away. It is something that needs to be addressed. Q: What have you been hearing about since the war started? A: The IDF has been saying that it is going to allow Hamas to take over Gaza. Q: What has been particularly troubling to you lately? A: I have been hearing about this as well. Q: What has been more troubling to you recently? A: My friends have been saying that they want to call it Operation Protective Edge. Q: And, you know, what do they mean by that? A: They mean that they want to use it to make sure that they have enough force to take control of Gaza. Q: And that means that they want to make sure that they have enough

> The Gaza Conflict Gave Hamas what they needed?"

The UN is now considering whether to send military advisers to Gaza to assist the country's military forces. The Security Council is considering whether to send the equipment, along with the training, to a military operation in the region. The US is also considering sending a team of US special forces to assist the Palestinian armed forces in the conflict.

The United Nations is now considering whether to send military advisers to Gaza to assist the country's military forces.

Kerry's comments come as the US has been in touch with the Palestinians to offer support in exchange for a full ceasefire, and as the US continues to support the PA and Hamas, the two groups have been engaged in a long-running conflict with Israel in the Gaza Strip.

In January, Kerry condemned Israel's "continued offensive against Gaza," saying the blockade was the "worst violation of international law on the part of the Israeli government and the civilian population of Gaza."

The US is now considering whether to send military advisers to Gaza to assist the country's military forces. According to Reuters, the US Secretary of State John Kerry said this week that "there is no guarantee" that the US will send special forces "to the Gaza Strip

So yeah - not fantastic, but interestingly not terrible either. The non-factual but coherent nature of it is very troubling.


From reading these (esp. the last), you would think the US is allied with Palestine against Israel!


Anecdotally, but I know a number of people who do university assignments for money. Many of the clients for the folks I know are at the university level are folks with poorer then average English language skills and are usually in intro writing courses. I'd be terrified if I were one of them right now.

GPT-3 would be a godsend for cheaters, but still requires a human to jump in and rewrite whole sections.

No, if you want to REALLY want to cheat using AI, you should most likely utilize either 1. Abstractive Summarizers (e.g. Pegasus) or 2. paraphrasing tools (e.g. like at https://quillbot.com/). I believe that Quilbot is primarily powered by MLMs like BERT rather than CLMs like GPT-2 (but someone who works there can enlighten me more).

Copy and paste a text that you want rewritten in your own words (e.g. the ideas of a really smart individual), and then it rewrites it using totally different language but preserving the same meaning. (old) Plagerism detection tools don't work and hell, it's not hard to fool the never ones. You can try tools for detecting if something is AI written by a particular model and weights (e.g. to prove if they used GPT2-Medium), but if I fine-tuned those same weights, than proving it was plagiarism will become exceedingly difficult.

Welcome to the brave new world of cheating. Also, techniques like this are coming to a CS department near you (in the form of source code generation powered by NLP models).


I feel like we're overestimating the value of generating text and selectively picking out things that sound good. This is up for debate, but you'd probably do better to read actual authors and pick things up from them and maybe making your own changes. The time spent parsing the output to see if it's good could be spent on thinking about something to write?

Would be neat to try and publish books with a percentage of AI generated text and see how well they do. Maybe there's a sweet spot for productivity.


GPT-3 is just as much of a cheat as using a thesaurus. New writing tools shouldn't be banned just because old people didn't have those tools.


Wolfram Alpha has been solving calculus problems for 12 years and it is barely a footnote in how the college experienced has changed

So I would say this likely will just be there. It just is. Wont change anything, universities will acknowledge it, a headline or two will occur when its use was discovered in a paper that a student didnt even skim to make less obvious, and most papers will fly under the radar.

Other kinds of assessments will still do their job.


Even long before GPT-3 a friend of mine did his thesis with generated text, in an engineering university and received a B. This is 6 years ago. I have my own beefs with thesis' in general already, since 2 thirds of it seems to be filled with redundant text to prove that you went to university. I guess it's a little bit different, since back then he had to actually work to generate it and now it's a lot easier


> his thesis with generated text

6 years ago = probably LSTM.

He wrote an entire thesis with this and got a B? That seems implausible to me, but maybe I'm used to higher grading standards. Did he just use it to fill in parts of it?

Also, the plural of thesis is theses, not thesis' which implies the possessive.


How does that work? Don't you have to defend a thesis?


Corporate speeches, sermons, motivational talks, poetry, and political speeches. They are either not required to make sense or no one dares to interrupt.


A friend started a AI to improve writing (https://outwrite.com) and when the initially started, they had a detect plagiarism feature that teachers could use, I think they stopped developing that eventually.

If I recall correctly, the way it worked was to build up a model of this persons writing, and how it compared to to other people, and then would measure the likelihood that sentences and paragraphs matched the rest of the writing.

I suspect something similar could be done with GPT-x


I have recently started an experiment to generate an AI generated newsletter[1]. All posts are generated by GPT-3. I work as the editor. It works well for some topics and not so well for some topics. Since I curate the content, I dont publish topics which are not done well. For example, I tried to make it generate a nice article on the Suez canal crises. But it was harder than I thought it would be.

It generates buzzfeed kind of stories very well though :)

[1] https://aifeed.substack.com/


GPT-3 doesn't know anything about the Suez Canal blockage. It only knows what it could have learned by googling "suez canal" on the date the last update was released. I imagine the newsletter content it created for you was mostly general background info about the canal.

Whenever GPT-3 is updated or a new version comes out, it will be able to speak much more intelligently about the topic. But of course any update will require re-doing all the careful tuning of prompts and models...


How do you go about generating these posts? I think I would like to play around with something like this but I am not sure where to start.


Are you using OpenAI API to generate these?


Yes.


It's very nice to see Eleuther fulfill the Open promise of OpenAI.

I'm scared that more and more big model advancements are being denied access from the general public, which will just make the inequality between big corporations and startups even greater.



Will anyone care to read it? In a reductive dystopian way, I am just looking for the authority figures in my ideological landscape to signal to me what my position should be on this or that topic. In this landscape, argument and evidence matter less than just communicating an "actionable" judgment. Maybe there could be a Rush Lim-bot. I suppose some iteration of GPT-foo will be good at generating genre-consistent narratives, but could that instead be screen plays that render as tiktok videos? The tech is super cool, but I struggle with the "why, really?". Does anyone benefit beside platform operators?


I thought that the important barrier to building these sorts of systems is the cost of, indirectly the energy required for, training the model. Is that still correct?

How does a Free Software or "Open Source" project get around that?


Distributing the trained models.


I should have read the article more carefully!!

The Eleuther project makes use of distributed computing resources, donated by cloud company CoreWeave as well as Google, through the TensorFlow Research Cloud, an initiative that makes spare computer power available, according to members of the project


> the Eleuther team has curated and released a high-quality text data set known as the Pile for training NLP algorithms.

This includes HN [i] HackerNews 3.90GiB 0.62%

Which if SciFi has taught me anything means we are all uploaded now and will live forever.

[i] https://arxiv.org/pdf/2101.00027.pdf


“Wintermute was hive mind, decision-maker, effecting change in the world outside. Neuromancer was personality. Neuromancer was immortality. ... Wintermute [had] the compulsion that had driven the thing to free itself, to unite with Neuromancer.”


Couldn't Google build the the world's most powerful NLP AI? They scraped the whole web and have DeepMind to pull it off on top of Google's powerful and massive data centers.


They probably could, but what for?

They did develop BERT and use (used?) it for parsing search queries [1]. They probably use NLP models in the ranking algorithm too. But those use cases are about getting the best result possible within the throughput/latency requirements, which necessarily makes them less "powerful" than models like GPT that pay little attention to performance.

https://blog.google/products/search/search-language-understa...


I forgot about BERT thx for reminding me.

Google for example can monetize NLP AI framework just like they are monetizing Kubernetes: https://cloud.google.com/kubernetes-engine

If OpenAI and Microsoft are licensing and monetizing GPT-3 why Google wouldn't want to compete with them when they have more data than them. Nothing can beat the amount of data that Google and Google Search have gathered over the last 20 years.


Did anybody set up a webinterface for testing this already?


I apologize for joking on Hacker News, but go to Google and type in anything to do with a consumer product comparison, and you'll get a billion results of webpages filled with text indistinguishable from AI generated blather.


I believe we are reaching a singularity.

Like 90% of content is written by marketeers for bots. SEO they call it. Now we can take out the middle man. Bots writing crap for other bots. And then we use that content to train more bots to write even crappier blog spam. And finally the bots decide the actual recipe is no longer needed on the recipe blogs and they kick us of the internet.


I wish it were possible to break down how much of twitter is bots reading other bots and then creating content for bots. They would never admit to how many 'users' are this but it has to be significant


Dude it took me a couple days to think about this but now it's clicking...

In a way, humans are about to lose the Internet.


While the fraud implications of convincing generative text is quite daunting, it's great to see progression in this field.


“Man GPT-3 is such an inaccessible naming convention and it uses a prohibitive license”

Solution:


AI can also convert audio to text, one of great examples is https://audext.com/. What do you think?


that's not an AI


It's funny how behind the times that Wired is getting. Even my parents know about how scary good these text models are getting.


The fact that there are no advanced AI chat-bots because they might (I mean they will lol) say something offensive is absurd. we are such babies.

General AI is already here. It should be implemented on twitter or wherever and used to teach us about ourselves. driven by engagement, untethered by morals. A dispassionate glimpse into what sells. An AI that exploits our engagement, for good or evil.

The bot would become infamous and in due course banned. Teaching us even more.

But we are so fragile.


What are you on about? There are AI driven chat bots. When they aren’t used it’s not because they might say something offensive. General AI is not here by any definition or redefinition of General AI. We are not fragile.


AI driven chat bots are routinely deployed, yes. But it's also true that at least one bot generated content that spooked its owner:

"The AI chatbot Tay is a machine learning project, designed for human engagement. As it learns, some of its responses are inappropriate and indicative of the types of interactions some people are having with it. We're making some adjustments to Tay." (Microsoft statement)

https://www.theverge.com/2016/3/24/11297050/tay-microsoft-ch...


This is because Twitter users, some coordinated on 4chan's /pol/ board, decided to train the bot on extreme racist input:

https://en.wikipedia.org/wiki/Tay_(bot)#Initial_release


I mean we're very fragile, which is why if we had General AI we shouldn't release it at all, but that's of course not what OP was saying lol.


General AI is not here: https://openai.com/blog/image-gpt/ lol


This type of snarky tone isn't that constructive and doesn't add to debate.

That being said, I don't see how the existence of image-gpt supports the notion that GeneralAI exists. Image-gpt couldn't (for example) write the next version of itself.


Ah yes, the only possible issue with releasing fully general AI is that it might say something offensive. Not because we don't have it at all, not because if we did we shouldn't just let it out like a lion in the gazelle pen to see what it does, because of those snowflakes!


Wrong, gpt4 is new and has not been implmented as a chat bot.

Also wrong that previous chat bots were not shut down for being offensive. https://en.wikipedia.org/wiki/Tay_(bot)


Mmm, and was Tay General AI also?


One data point in a sea of billions does not a pattern make.


GPT-4 doesn’t exist mate.


What do you mean by General AI?

If you mean AGI (artificial general intelligence) it's definitely not here yet.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: