Interesting approach! I've thought about a similar method after reading about the PLATO platform.
When playing astro‑maze, the delay is noticeable, and in a 2D action game such delays are especially apparent. Games that don’t rely on tight real‑time input might perform better. (I'm connecting from Europe, though.)
If you add support for drawing from images (such as spritesheets or tilesheets) in the future, and the client stores those images and sounds locally, the entire screen could be drawn from these assets, so no pixel data would need to be transferred, only commands like "draw tile 56 at position (x, y)."
Yeah.. As people are playing and I'm watching their feedbacks it is becoming clear to me that the main source of input delay comes from the distance to the server.. the whole game is running in a single machine in SFO, so it makes total sense this bad exp in europe
I think this is inevitable unless I add some optimism/interpolation in the client
Also, thanks for the feedback! I will fix the Abstra landing page
Large text-to-speech and speech-to-text models have been greatly improving recently.
But I wish there were an offline, on-device, multilingual text-to-speech solution with good voices for a standard PC — one that doesn't require a GPU, tons of RAM, or max out the CPU.
In my research, I didn't find anything that fits the bill. People often mention Tortoise TTS, but I think it garbles words too often. The only plug-in solution for desktop apps I know of is the commercial and rather pricey Acapela SDK.
I hope someone can shrink those new neural network–based models to run efficiently on a typical computer. Ideally, it should run at under 50% CPU load on an average Windows laptop that’s several years old, and start speaking almost immediately (less than 400ms delay).
The same goes for speech-to-text. Whisper.cpp is fine, but last time I looked, it wasn't able to transcribe audio at real-time speed on a standard laptop.
I'd pay for something like this as long as it's less expensive than Acapela.
The sample sounds impressive, but based on their claim -- 'Streaming inference is faster than playback even on an A100 40GB for the 3 billion parameter model' -- I don't think this could run on a standard laptop.
Thanks! But I get the impression that with Kokoro, a strong CPU still requires about two seconds to generate one sentence, which is too much of a delay for a TTS voice in an AAC app.
I'd rather accept a little compromise regarding the voice and intonation quality, as long as the TTS system doesn't frequently garble words. The AAC app is used on tablet PCs running from battery, so the lower the CPU usage and energy draw, the better.
I use Piper for one of my apps. It runs on CPU and doesn't require a GPU. It will run well on a raspberry pi. I found a couple of permissively licensed voices that could handle technical terms without garbling them.
However, it is unmaintained and the Apple Silicon build is broken.
My app also uses whisper.cpp. It runs in real time on Apple Sillicon or on modern fast CPUs like AMD's gaming CPUs.
I had already suspected that I hadn't found all the possibilities regarding Tortoise TTS, Coqui, Piper, etc. It is sometimes difficult to determine how good a TTS framework really is.
Do you possibly have links to the voices you found?
In my experience, this bug - lags and overheating when drawing with the Apple Pencil - exists since iPadOS 16. When searching for it on the web, I found lots of reports and no indication that it is solved, including by hardware replacements.
In any case, HN's guidelines ask to use the original title of an article, unless it is misleading or linkbait. I'd agree that Apple's software quality has been going down.
I've used Apple's Automator app to add a new custom Quick Action which does exactly this. After right-clicking a folder, the right-click menu shows my custom Quick Action to create an empty text file.
This requires about 5 to 10 minutes to set up. You'll find instructions for this on the web or via some LLM. I've looked right now for a suitable article, but the ones I've found are subtly different from my Quick Action. I've asked ChatGPT and its instructions seem to be correct.
Am I blind or is there no mention at all of the GPT model he used?
The author states his conclusions but doesn't give the reader the information required to examine the problem.
- Whether the article to be summarized fits into the tested GPT model's context size
- The prompt
- The number of attempts
- He doesn't always state which information in the summary, specifically, is missing or wrong
For example: "I first tried to let ChatGPT one of my key posts (...). ChatGPT made a total mess of it. What it said had little to do with the original post, and where it did, it said the opposite of what the post said." He doesn't say which statements of the original article were reproduced falsely by ChatGPT.
My experience is that ChatGPT 4 is good when summarizing articles, and extremely helpful when I need to shorten my own writing. Recently I had to write a grant application with a strict size limit of 10 pages, and ChatGPT 4 helped me a lot by skillfully condensing my chapters into shorter texts. The model's understanding of the (rather niche) topic was very good. I never fed it more than about two pages of text at once. It also adopted my style of writing to a sufficient degree. A hypothetical human who'd have to help on short notice probably would have needed a whole stressful day to do comparable work.
You write as if you’ve found a hole in the article’s argument. The lack of evidence is a hole in the reporting, for sure. The tone of your comment suggests you feel that by not publishing all their evidence, the author’s point is wrong (rather than under-justified). However, the example you use to back up your point also backs up the article’s point. The article’s point is that ChatGPT doesn’t summarise, it only shortens. Your example indicates shortening, but not summarising.
There’s just so many articles of people whining about how ChatGPT can’t do things, when they clearly havent prompted it very thoughtfully.
So I think that’s why you see so many reactions like this.
I’ve found chatGPT incredibly good at all sorts of things people say it is bad at, but you need patience and to really figure out the boundaries of the task and keep adding guidance to the prompt to keep it on track.
The article makes it clear that there is a semantic difference between shortening and summarizing and that importantly summarizing requires understanding which ChatGPT most certainly does not have.
One example in the article is that if you have 35 sentences leading up to a 36th sentence conclusion, ChatGPT is very likely to shorten it to things in the earlier sentences and never actually summarize the important point.
You seem to be on the "statistical next token predictor" side. I'm more.on the side of those who invented it (they should know) that think these machines can understand things
In 1964, Joe Weizenbaum created a chatbot called "Eliza" based on pattern matching and repeating back to users what they said. "He was surprised and shocked that some people, including Weizenbaum's secretary, attributed human-like feelings to the computer program." People are notorious for anthropomorphizing and attributing to things attributes (including human-like attributes) that they do not possess. [1,2] LLMs are a "statistical next token predictor" by their design. The discovery that coherent and interesting communications are relatively easily statistically modeled and reconstructed if given enough computing power and corpus of training data does not therefore imply that these programs have latent thinking and understanding capabilities.
Just the opposite: it calls into question if _we_ have thinking and understanding capabilities or if we are complicated stochastic parrots. [3] The best probing of these questions is done at the limits of comprehension and with unique and previously unseen information. I.e., how do you comprehend and process to previously unseen/unfelt/not-understood qualia? Not about how you deal with the mundanity of reactions between people (which are somewhat trivial to describe and model). [4]
At what point does it become easier to just do the task yourself? I’ve pondered this question often and came to the conclusion that it’s not worth at the current level of output for me to tinker with it until I get sensible responses.
It depends on the task. Sometimes I have just given up when it really can’t get something.
But other times I’ve persevered and once it’s ‘got’ it, it can then repeat it as many times as I need. That’s the knack really. Get it to the point of understanding and then reuse that infinitely and save yourself a lot of time.
In the example I mentioned, ChatGPT 4 did keep all essential statements of my texts when reproducing shorter versions of them. For example, it often wrote one high-level sentence which skillfully summarized a paragraph of the original text. As far as I understand, this is what the author meant by 'summarizing' vs. 'shortening (while missing essential statements)'.
I was impressed at those high-level summaries. If I had assigned this task to several humans, I'm not sure how many would have been able to achieve similar results.
For example, looking at the ChatGPT link the author has, the model loaded 5 pages besides the one the author wanted. That clearly is going to cause some issues but the author didn't modify the prompt to prevent it. It was also a misspelled five (?) word prompt.
I don't see how you can draw conclusions from a model not reading your mind when you give it basically no instructions.
You need to treat models like an new hire you're delegating to and not an omniscient being that reads your intent on it's own.
Why, if the author asks it to summarise a single webpage and gives the link should ChatGPT go out and load 5 more (one is the same page again, the others short overview pages, so won't have influenced the result much)
And why all this talk about trying to engineer a prompt so that in the end the result is good? Should an actual usable system not just handle "Please summarise [url/PDF]"? That is, I suspect, what people expect to be able to do.
Summarize clearly means something different to the author and the people who think the model results are good. Everyone expects different things. Most people are used to others knowing their preferences and adjusting over time. Models do not unless you tell them.
To be fair, most of the commentary on both sides of the LLM conversation are pretty anecdotal, which is increasingly looking like a structural problem given that any solid evidence goes in the training set in about an hour.
Definitely. Otherwise it would have required a lot more than a single blog post. It is an observation, not anything rigorous with a large number of examples, and decent statistics.
In the comments, the author clarified that he used GPT-4 for the article.
> What the colleague used, I can ask, but I suspect standard ChatGPT based on GPT-4. But my test was with GPT-4 (current standard), so that would mean about 8000 tokens (or roughly 4000 words, I think?). That may have influenced the result.
I prefer to work intensely and collaboratively in an office.
This is how I'd do it: Three in-office days, same weekdays for everyone (e.g. Monday to Wednesday). The choice to have a 5-day or 4-day week.
An energetic, quietly humming work atmosphere, with incidental information sharing and a spirit of collaboration, with colleagues present and nearby, sounds best for me personally. Among other advantages, the presence of coworkers helps me focus.
Different strokes for different folks. Obviously, the prerequisite is that it's a team of nice people you like being around.
Sure, people who prefer to work from home would leave that company. That doesn't mean that this company will lack talent. People who do want to work like that will join it.
We're currently building a new AAC device which allows users to write/speak quite a bit faster, and we're encountering exactly the hurdles you mention. Would you be willing to exchange some helpful pointers about how to bring an AAC device to market? Contact info in my profile.
Did your game require realistic physics collisions? If not, this might be unnecessary complexity. Almost no 2D shoot'em up game before 2000, and very few afterwards, go this route. Here's the common method to make a shmup with very simple rectangle comparisons:
But if your space debris objects are supposed to collide and agglomerate realistically, and if the player ships are supposed to have difficulty pushing a cluster of heavy objects out of the way, then using a physics library is sensible.
The point is you fly in a small spaceship in an arena, you hide behind obstacles, you shoot with a small variety of weapons and you use your weapons to either shoot the enemy directly or rearrange the map to make life difficult for your them. The collisions need to be realistic because you need to be able to predict what is going to happen when you hit things a certain way.
It is just a concept we are playing with.
Another part of that concept is that this game is meant for small kids that can't read. There is not a single letter or digit in the entire game. No menu. You just start the controller and get immediately pulled into the game.
And another feature is we wanted the game fun because the control feel fun and immediate. So we are experimenting a lot with what it means for the controls to be enjoyable.
Feedback: Your introduction "From Mario bouncing off a Goomba..." might be a bit misleading IMO because most games like the classic Super Mario titles on NES and SNES do not require and did not use most of these calculations.
Game development beginners often have the wrong impression that they need rigid body collision calculations or a 2D physics engine like Box2D to handle collisions. That's true if you want to make a game like Pool or something with collapsing stacks of crates like Angry Birds.
But for a 2D platformer you only need to detect collisions by comparing (axis-aligned) rectangles and to handle collisions by changing the moving character's X and Y coordinates (to undo an overlap) or setting the character's Y velocity (after using the jump button, or after landing on a Goomba's head).
This also makes it easier for the developer to finetune exactly how moving the character should feel like. (This includes inertia, but this inertia is usually not physically realistic.) Trying to use realistic physics as a gamedev beginner can easily lead to floaty and unsatisfying movement.
When playing astro‑maze, the delay is noticeable, and in a 2D action game such delays are especially apparent. Games that don’t rely on tight real‑time input might perform better. (I'm connecting from Europe, though.)
If you add support for drawing from images (such as spritesheets or tilesheets) in the future, and the client stores those images and sounds locally, the entire screen could be drawn from these assets, so no pixel data would need to be transferred, only commands like "draw tile 56 at position (x, y)."
(By the way, opening abstra.io in a German-language browser leads to https://www.abstra.io/deundefined which shows a 404 error.)