I recently tried using Qwen VL or Moondream to see if off-the-shelf they would be able to accurately detect most of the interesting UI elements on the screen, either in the browser or your average desktop app.
It was a somewhat naive attempt, but it didn't look like they performed well without perhaps much additional work. I wonder if there are models that do much better, maybe whatever OpenAI uses internally for operator, but I'm not clear how bulletproof that one is either.
These models weren't trained specifically for UI object detection and grounding, so, it's plausible that if they were trained on just UI long enough, they would actually be quite good. Curious if others have insight into this.
Sharing this one as a corollary to the "The Future of Large Files in Git Is Git" post: https://news.ycombinator.com/item?id=44916783 - I didn't see any mentions of exclusive locks in the post, an often valuable aspect of version control of large files.
This is the major issue with the majority of this technology at the moment. Theres a plethora of options available and soon to be unveiled by several startups who are talking up their tech... but they are almost all for "editing"/"after the recording" work. You have to have a complete recorded track you can pass into their software (usually by uploading to their service) and then it will crunch away at the file and work their magic.
The current real time options I've found are... lacking, they are mostly fake/toys (not actually using voice cloning, just old school pitch shifting) or tech demo videos, with a scattering of research papers which are highly variable in terms of "how easily can i reproduce this", ranging from "sure if I want to waste money on a google colab instance, to "only works with specific model of video card due to reasons"
If you know of any real-time (audio stream in -> audio stream out) voice cloning/transform/replacement tools, feel free to post about them in a reply, this is an area of tech I'm trying to keep on top of and I'm only human so I have no idea what new company or research I might miss.
Hey - ElevenLabs dev here. The quality above works with <1s latency that for some real-time apps is already sufficient. On smaller chunks of text it can be as quick as ~500ms.
I actually interned at EA Mythic in summer 2007 and remember everybody out there being talented, obsessed with games (I miss those lunch chats, people would work on games and talk about games when not working on games) and great to work with. It felt idyllic at the time, didn't at all match the "EA spouse" experience I was half-expecting. Maybe it was too early in the studio's relationship with EA for that to become reality.
Too bad Warhammer Online ended up flopping and sealing the company's fate, given all of the hard work put into it. Not clear if the market had room for one more ultra-successful MMO after WoW at that point.
This is actually what Sweetwater does too when it comes to music gear.
I've gotten a few duds from Amazon in the past, but smaller, focused online stores like Sweetwater will inspect your gear for you before shipping it out to you, will give you a direct line to a sales+support person. In fact they'll even give you the email of the entire executive team of the company if you want to contact them directly.
As much as I like free Prime shipping, for higher end gear I'd much rather have access to a real person with expertise in the gear I'm purchasing, and a brand name and reputation to protect. I was not a believer at first, staunchly sticking to Amazon, but everything I've gotten from them has been top notch and without any defects, so I became a convert.
Startup idea: create a high quality camera gear buying and selling experience for the web, with many protections and conveniences built in. Selling your gear on Craigslist and meeting with random strangers at McDonald's and Starbucks is pretty much the only real alternative right now and gets old pretty fast.
This was an issue for music gear too, but somehow reverb.com managed to address it and make it a pretty painless experience. Their customer service is excellent, and if one of the two parties are unsatisfied, they'll intervene and try to find a compromise. They send you boxes to ship your gear in, they set up shipping for you, they automatically track the shipment as it gets picked up etc. I've been hoping to find something similar for camera gear, but have had no luck so far.
The only downside is that the prosumer camera equipment world seems to be rapidly shrinking, so it might be not a great idea to step into this space right now. Whereas there doesn't seem to be a dearth of people buying guitars, drum kit pieces and effects pedals.
Keh (https://www.keh.com/) is sort of like this, except that they function more like a second-hand shop--you sell them your gear, and they hold inventory that other people can buy, meaning you never actually interact with the eventual buyer. Because of this though, I imagine they take a larger cut than Reverb does (and certainly more than eBay).
As a buyer I've had zero problems with Keh the couple of times I've used them.
I've gotten quotes through a few of these sites in the past, and I was getting at best 50-70% of the value I would have gotten by selling in person through CL. I imagine it's in part because you ship it over to them, they have to inspect the items etc.
I think those services can be useful for buyers, but sellers who want to get the most for their gear will avoid them. Like you said, something like Reverb could compete on taking a smaller cut.
Yeah its a good concept and I would love an easy, secure way to sell items however they are quoting $965 USD for my Nikon Z6 which is out-of box new and retails for $1,996.95 USD. That's pretty hard to swallow. For that I'll put in the time to sell it locally, meet and try in person, cash only.
Same with the area around Golden Gate Park in SF, being close to so much well-maintained and curated nature is delightful. The place is a real treasure.
The words "well-maintained" and "curated" don't usually rhyme very well with "nature", IMO. Real nature is by definition unmaintained. When it's maintained it becomes a park/garden, and loses more or less all its diversity. Fine, if you just want to look at some green leaves (which is great in a city) but we shouldn't call it nature just because something is alive there.
The Bay Area is also home to MIDI co-inventor Dave Smith and his hardware synth company Sequential. You can see his office and generous collection of synths from the street, if you roam around San Francisco's North Beach long enough.
It was a somewhat naive attempt, but it didn't look like they performed well without perhaps much additional work. I wonder if there are models that do much better, maybe whatever OpenAI uses internally for operator, but I'm not clear how bulletproof that one is either.
These models weren't trained specifically for UI object detection and grounding, so, it's plausible that if they were trained on just UI long enough, they would actually be quite good. Curious if others have insight into this.