I signed up for OpenAI's ChatGPT tool, and entered a query, like 'What does the notation 1e100 mean?' (just to try it out). And then when displaying the output it would start outputting the reply in a slow way, like, it was dripfeeded to me, and I was like: 'what? surely this could be faster?'
Maybe I'm missing something crucial here, but why does it dripfeed answers like this? Does it have to think really hard about the meaning of 1e100? Why can't it just spit it out instantly without such a delay/drip, like with the near-instant Wolfram Alpha?
Under the hood, GPT works by predicting the next token when provided with an input sequence of words. At each step a single word is generated taking into consideration all the previous words.
You can but it’ll take longer. So one way to get faster answers is to stream the response as it is generated. And in GPT-based apps the response is generated token by token (~4chars), hence what you’re seeing.
Its a result of how these transformer models work. It's pretty quick for the amount of work it does, but it's not looking up anything, it's generating it a token a time.
Maybe I'm missing something crucial here, but why does it dripfeed answers like this? Does it have to think really hard about the meaning of 1e100? Why can't it just spit it out instantly without such a delay/drip, like with the near-instant Wolfram Alpha?