I signed up for OpenAI's ChatGPT tool, and entered a query, like 'What does the ...

notRobot · on Oct 12, 2023

Under the hood, GPT works by predicting the next token when provided with an input sequence of words. At each step a single word is generated taking into consideration all the previous words.

https://ai.stackexchange.com/questions/38923/why-does-chatgp...

swatcoder · on Oct 12, 2023

The non-technical way to think about it is that ChatGPT “thinks out loud” and can only “think out loud”.

Future products would be able to hide some of that, but for now, that’s what the ChatGPT / Bing Assistant product does.

baby · on Oct 12, 2023

You can but it’ll take longer. So one way to get faster answers is to stream the response as it is generated. And in GPT-based apps the response is generated token by token (~4chars), hence what you’re seeing.

codedokode · on Oct 12, 2023

Because it needs to do billions of arithmetic operations to generate a reply. Replying to questions is not an easy task.

maccam912 · on Oct 12, 2023

Its a result of how these transformer models work. It's pretty quick for the amount of work it does, but it's not looking up anything, it's generating it a token a time.