This is a newbie question, but why is the lidar/radar necessary? Is it solely because computers aren't yet able to extract as much information from video as humans can from sight? Theoretically, could self driving cars use just video with enough compute resources and the right ML algorithms?
There are a lot of people here very confidently stating that LIDAR isn't necessary but seemingly forgetting to link to any actual evidence of fully autonomous vehicles that work without it. The issue is that human vision isn't just vision, it's vision linked with a ridiculous general intelligence that we're no where close to reproducing. So humans are great at categorising the objects they see, and reasoning and how they'll behave, which is why we don't generally need LIDAR.
So it's just as much about how humans can reason from sight as it is about how good their sight actually is. It's very possible that by the time you've built a computer capable of the same sort of reasoning that humans do, you've actually solved, well, artifical general intelligence and you're no longer designing a car at all - because the AGI took that job ages ago.
We,humans,are very good at some things that will probably take decades for computers to get anywhere near such a level. For instance, you driving a car and there's a huge truck ahead of you transporting big logs. There isn't much traffic and eventually you get close enough to the truck to see that the structure holding the logs is somewhat loose.Your brain can instantly create a scenario of what would happen if the logs would start falling on the road.This alone gives you enough info to act upon.What you do( slow down or try to take over and warn the driver at the same time) is different question.In probably less than a second your brain recognised the type of cargo the truck had, it's physical properties, some outcomes should parts of the truck fail and so on. For us it's easy,for computer this would probably take days worth computation.
Not just humans, bit even the tiniest of insects are capable of navigating with ease in complex environments at a speed and adaptability we are not even close to match. I'm pretty sure that having the raw intelligence of an ant brain in a mars rover would expand our exploration capabilities by several order of magnitudes.
>There are a lot of people here very confidently stating that LIDAR isn't necessary but seemingly forgetting to link to any actual evidence of fully autonomous vehicles that work without it.
Funny, I was not aware of any fully autonomous vehicle that actually works with or without it.
I'm not sure that human uses that general intelligence when driving and when decisions must be made quickly. I think that brain uses shortcuts and does not really reason about things. And those shortcuts are likely work similar to trained neural networks. That's one reason why novice drivers have problems: they don't have those shortcuts in their brains yet, so they have to reason and that takes valuable time and attention.
Though I think that robotic automobiles must perform better than humans, so equipping them with better sensors makes perfect sense, regardless of algorithms.
I think it's always best to avoid comparing neural nets with biological neurones and their networks. The comparison just doesn't hold since they are completely different in almost every way.
Also I think that it actually proves that humans use their general intelligence to drive, since Humans can learn on the go and add almost any type of new knowledge and act upon it without even realizing it. The brain is not looking for anything in particular, just any type of information that would be relevant to driving.
According to your average SV pitch deck, we are 6-10 months away from reproducing general intelligence. Add in a sprinkling of Pascal's Wager, and your funding is almost guaranteed.
Which goes back all the way to the Dartmouth workshop, which hoped to get most of the work done in 6-8 weeks. That was in 1956. Is most of the work done? Well there you go. (Your hint that it's a good way to get paid for such work holds, of course, but the expectation that Problems will get Solved...unlikely)
There's a whole lot of challenging information that is completely natural and intuitive for a human to understand but fiendishly difficult for a ML algorithm to figure out. There's some cues that I'm sure we probably won't be able to use until we create a genuine artificial general intelligence. If you're driving and you see someone standing at a crosswalk it's intuitive just by looking at them if they're waiting for you to pass, about to walk out into the road, panhandling, etc. You can put yourself in their shoes and make a reasonable prediction as to what they are thinking about doing. On a previous HN thread there was a commenter that had a run in with a Waymo car while riding a bike. He was coming up to a 4 way stop, and yielding to the Waymo car. If he was balancing on his pedals, still stationary, the Waymo car would stop and not proceed through the intersection, apparently interpreting his pose as if he's about to proceed through the intersection itself. A human driver wouldn't have an issue with that, but you can imagine the training data would show that when a bicycle is stopped, the rider puts a foot on the ground.
Generous helpings of lidar and radar to augment cameras is a crutch to help compensate for the lack of 500 million years of unsupervised learning that went into our visual cortex.
It might look intuitive just by looking at them, but good drivers won't trust that intuition, because humans are very unpredictable species. So good drivers will slow down to have enough time to react if necessary. You should not trust "reasonable prediction" when you're talking about death danger.
Which means "do not drive, ever." The actual risk appetite for (human) driving is very different from the advertised one; the heated discussions on SDV just bring this to light.
For those specific cases no, but it makes a great source to identify physical objects and their placement in the world. Obviously that can be done to a sufficient degree with just stereoscopic vision, but look at all of the autopilot fatalities from Tesla. All too often vision and radar data gets misinterpreted as "That must be an object to the side of the road" right up until it crashes into it. With lidar you can be sure you're not just looking at a radar reflection or getting the perspective wrong on a camera. Almost all of the large objects that you want to make sure you never hit will show up well with lidar.
Just about every one of those fatalities can be summed up as "Tesla ignores large stationary object directly ahead". Lidar would have detected all of those objects and most likely prevented every one of those accidents. I think Tesla currently has the best vision and radar only system out there, so either the state of the art doesn't quite cut it yet without lidar, or there's a ML engineer at Tesla that really really hates fire trucks.
Or Telsa doesn't actually provide Full Self Driving yet and people shouldn't be watching their phones.
It is noticing stationary objects, because sometimes it breaks when the car approaches a bridge going overhead, which is also bad when the car too close behind doesn't.
You have to ignore some objects in front of you (even ones heading directly towards you) because you're going round corners, so it's never cut and dried
It isn't necessary. As you suggest, humans have only normal eyes, not laser eyes, and we still manage to drive.
Yes and no. It's also just nice to have more different kinds of data available. Just because humans don't have laser eyes doesn't mean we can't try to do better than that.
Tesla in particular, and others are moving towards getting depth from ML. Yes, you can do dumb coincidence-finding, but there's a lot of corner cases (leafy objects, specular reflections, etc) that screw this up. Humans don't just use coincidence finding, but use all kinds of other clues (size, texture gradients, monocular parallax, shadows, linear perspective, attenuation from haze, etc) to infer depth.
Maybe not just, but we do use it, except we do it by converging our eyes for _actual_ coincidence within a small focal center instead of ambiguous coincidence everywhere in the field of view like CV stereo typically tries to do. Gimbaled cameras that (metaphorically speaking, of course) "look" where drivers are supposed to look with an attention model instead of being statically coplanar could do this too.
> except we do it by converging our eyes for _actual_ coincidence within our focal center instead of ambiguous coincidence everywhere in the field of view.
Only at very close range. That's convergence, and mostly a 5m and less phenomenon (not so relevant at driving ranges).
Tesla is using data from the radar to annotate the images from the cameras with distance before running it through the ML training, but for driving it is using a combination of all (stereo vision, ML and radar) to produce the final number.
The human neural network is of course fine tuned to this and my father who is blind on one eye will actually move his head sideways a bit to judge distance when he is driving :)
Yah. Waymo is doing the same with the lidar, which provides both higher accuracy and higher resolution "ground truth" distance data. And yes, everyone uses sensor fusion and filtering, not just "ML".
I'd argue that the majority of victims are the result of humans suppressing their judgment. E.g. not because a human was incapable to drive safely, but purposedly chose to drive unsafely: overtake where it is limited visibility, speeding when conditions do not allow for it.
It isn't "necessary" for small values of "necessary", ie if you have a human brain hooked up to human eyes. It is extremely necessary if you only have a computer and some primitive software, like we do now, as in "self-driving won't work as well otherwise".
Yet depth from stereo is, to some extent, AI because the matching of pixels that corresponds to the same physical point is not trivial and uses heuristics.
Even in broad day light and hith High Res videos and with ML algos, we cannot extract all information accurately that are needed. For example, at what velocity the other vehicle is moving and also at what distance we have an obstacle etc. is very difficult to extract from camera alone and impossible when it is night or extreme weather conditions. Hence we need Radar & Lidars also for ADAS & Self driving applications.
Radar: It can accurately measure Distance and Velocity information of objects around ego vehicle and also can track objects. It works well in all weather conditions (Day/night/rain/fog etc).
Lidar: Good distance measurement, Rich in data (3D Point cloud) for ML, OK in doing classification (pedestrain, bicyclist etc) even in nights. But expensive sensor.
Besides the other reasons it's also that the dynamic range of eyes are way higher than the cameras they can use here. And humans have access to mechanisms that allow them to handle washouts.
They're just building in the equivalent of putting in your sunglasses or looking off to the side when the sun is in your eyes or readjusting your seat position so it isn't hitting your eyes.
There's no practical reason to limit autonomous vehicles to human-visible signals. The point is to make a car that drives safer than a human. Since we can't make a car that's smarter than a human, we can at least even the odds and make it see much better.
I’m not convinced that the eye is conclusively better than any camera out there. The human brain is incredible at stitching together visual input into a coherent view of the world. The retina has high resolution but covers an incredibly small field of view; rapid eye movements make it possible to fake a larger effective field of view. And I’m not convinced that the retina’s resolution is actually better than that of high-end camera systems. The eye’s performance in the periphery is decidedly worse than that of a wide-angle camera.
Dynamic range is a challenge for cameras, but we have high-dynamic-range imaging nowadays (both in software, i.e. exposure combining, and hardware). I don’t think the eye is significantly superior here.
Low-light used to be a strength of the human visual system, but I think modern computational imaging systems have caught up (plus, human night vision was really never that good compared with other animals).
So in short: I dispute the idea that cameras don’t have the acuity of the human visual system, nowadays. I’d like to know in which aspects you believe human eyes to still be superior, from an optical point of view. Obviously - the human brain and visual cortex is something that computers are nowhere close to.
> So in short: I dispute the idea that cameras don’t have the acuity of the human visual system, nowadays. I’d like to know in which aspects you believe human eyes to still be superior, from an optical point of view.
Dynamic range. Your eyes can pick up subtle details in a scene with very bright lights, and very dark shadows.
No video camera currently exists that can take a good video from inside a city, at night, with a starry night sky. Either the stars, or the lights, or the shadows are going to look like crap. Your eyes can trivially handle such a problem.
The ability to capture both stars and a daylit scene using the same sensor is pretty hardcore. But yes, there's some fancy processing layer which is far harder to replicate than the optics. Last time I tried if I point a generic (not specialised for the task) camera at the night sky I get blackness but my eyes see stars.
On the optics, I'd say the human eye is specialized, but not on the whole outstanding. Fantastic dynamic range, lousy spatial and time resolution, a very good tracking mount. It's just compromised on the things (resolution) that can be filled in with fast scanning over a scene + post-processing.
Computer vision is a very noisy application. Unless you have some extremely good software filters, that noise will reduce reliability, what in a car may end in disaster. Even humans are fooled by it once in a while.
And even if Google has those extremely good filters, there is an entire different order of magnitude of testing until they can be sure. They are probably jut trying to get something out of the lab by using reliable sensors, and letting optimizations for later.
Why should we pursue biomimicry to that extent? My go to example is to compare aircraft to birds, so I could rephrase your question as why are the engine/fixed wings necessary? Is it solely because material engineering hasn't advanced yet to enable airplanes to flap their wings to stay airborne like birds?
Human eyes and brains are completely different from solid state electronics with different constraints and advantages.
There is in fact a pretty long history of biomimicry in aviation though. I've met many aerospace engineers (including a former NASA chief scientist) who would take your question very seriously. The answer may well be "no, that isn't the only reason", but it's not trivial to arrive at that answer.
Self-driving tech is still at an immature stage while the bar set by public is super high; even human-level self driving might be unacceptable for pervasive adoption. Anyway you can use both Lidar and camera. It's not too late to worry about this cost-performance trade-off after making it actually work.
It isn't necessary and the best example of this is Tesla's autopilot. It uses no Lidar component and Elon Musk's bet is that self-driving cars can just use just cameras, basic radar and ultrasonic sensors with enough compute resources and the right ML algorithms to perform better than a human.
He's been a long time anti lidar proponent because of the costs involved and the aesthetics. He's also betting that the amount of data Tesla receives from its customers, and the neural net they have can achieve autonomous driving with its current hardware stack.
“In my view, it’s a crutch that will drive companies to a local maximum that they will find very hard to get out of,” Musk said. He added, “Perhaps I am wrong, and I will look like a fool. But I am quite certain that I am not.
"Despite being a fancy and expensive technology, LiDAR provides surprisingly little advantage over a combination of cameras and radar. Radar, for example, is much better in the rain and other limited visibility scenarios, because it is based on radio waves rather than light waves. Radio can penetrate through some objects and bounce back from others, thereby “seeing” the environment along a different dimension."
Tesla is not a good piece of evidence. They registered a grand total of 12.2 autonomous miles in 2019. Their "Autopilot" is a particularly fancy driver assist system; if you take their marketing at face value and treat it like an autonomous driving system you are putting yourself and everyone on the road at risk.
That's fair, I've just taken the marketing at face value.
Though I ended the answer at the end asking whether Lidar adds incremental or exponential value? Do you think it adds exponential value ?
I'm not an autonomous car engineer so don't understand the nuances but from whatever basic information I've read it doesn't seem like Lidar's add exponential value.
Exponential. It gives range and shape data, which a pure-optical system needs to infer from a 2D image. This kind of image processing is still an open problem in ML.
The usual metric for self-driving car success is "disengagements per mile", ie how frequently a driver needs to intervene to avoid a crash. From my anecdotal readings of Tesla Autopilot reviews, it's on the order of 0.1 per mile. For Waymo and Cruise, it's on the order of 0.01 per THOUSAND miles. That's a very different definition of "driver" than the one that Tesla Autopilot requires.
I don't know the total number of miles on all Teslas on Autopilot, but it has had much more than one accident.
EDIT: and that Waymo crash was not a self-driving error; it was T-boned by a human-driven car running a red light.
Fair, depth perception from 2D isn’t there yet through ML, but mixed with radar can it be effective enough.
The reason I assumed it works is that lidar on the article above seems more like a redundancy. Because their camera system have the short range covered and radar has the long range covered. Lidar seems to augment over it.
Though the order of disengagement is a great stat, that definitely shows how much better waymo is compared to Tesla
> but mixed with radar can it be effective enough.
The usual solution is actually lidar + optical; lidar gives much better spatial resolution than radar, which is why it's been the standard going back to the DARPA challenge. You really want to have good spatial resolution in order to distinguish e.g. bikers and tail-lights and road signs for your optical systems, which radar typically isn't good enough for; that's the point of that qualifier in "imaging radar". Still probably worse performance (i.e. time and spatial resolution) than lidar, but better range and weather resistance.
(The previous generation of Waymo cars already had one lidar on top; the radar and the close-range lidars are the new additions.)
Their AutoPilot has driven billions of miles. Yes, the driver must still be paying attention and ready to take over. That doesn’t mean the system wasn’t driving.
Miles per disengagement I’m sure is not too high. That would be a good metric to have. But total miles is still ~2 billion.
> He's been a long time anti lidar proponent because of the costs involved and the aesthetics.
I can't help thinking that, whatever the merits of Lidar, Musk is boxed in, because Tesla has sold hundreds (tens?) of thousands of "self driving packages" for cars not equipped with Lidar, so changing course would not just mean raising prices on new cars, but retrofitting large numbers of existing cars at a ruinous cost.
> Elon Musk's bet is that self-driving cars can just use just cameras, basic radar and ultrasonic sensors with enough compute resources and the right ML algorithms to perform better than a human.
I'm not sure that unsubstantiated claims from Elon Musk are actual evidence that lidar isn't necessary.
It's substantiated by the fact that humans don't have radar and ultrasonics, just two cameras on a swivel and a lot of signal processing, and they succeed at operating a car to five-nines reliability measured in miles traversed. So Musk's bet isn't completely bonkers; he knows of at least one reference system that does the task with fewer sensors than even shipped with the Tesla.
... but we do want the SDC to do better, and there are failure modes that human perception is also vulnerable to generally. In addition to closing the gap faster on solving the problem without a copy of the human perception wetware, the LIDAR signal might also improve on those perception error states and be worth keeping in the design even if it could be done with cameras alone (or camera + radar + ultrasonic).
Defininitely agree, and I think that's the devil hiding in the details about Musk's bet that is worth surfacing: he's making the bet "We can just build a computer as good at this complex highly-variable task as a human being," and it's a bet people have been making and losing for decades.
Some day, someone will make that bet and be right. I haven't put my money on this team and this project. ;)
From the systems I've worked with it's usually AND and not OR, you use both a Lidar and a Radar. The Radar images I've seen were quite lousy and are not 100% interference prone.
The problem is Tesla autopilot only works on highways and has been implicated in a number of crashes. Waymo seems to be taking the much more cautious approach of using every advantage they can get. Perhaps one day there will be self driving cars without lidar. But for now I think Waymo's results speak for themselves.
I think it’s absolutely fair to expect autonomous vehicles to have awareness of a probable side collision in an intersection and be able to speed up or slow down to avoid it.
I often see oncoming cars rushing to make a left turn past when their arrow has turned yellow and red, and so I know not to enter an intersection even though my light has turned green.
Autonomous systems in theory should be better than humans at this because they can track all surrounding objects and trajectories, not just ones they are looking at with one set of eyes.
I think the accident rate is kind of a meaningless stats. You can have a low accident rate by carefully controlling the conditions under which you test. Not many accidents means they aren’t pushing the envelope. That’s probably a good thing on public roads. It’s also why the system is not available for general public use except under extremely controlled routes and close (remote) supervision.
I think it’s great we have (at least) two mega-companies in a race trying different approaches to reach a solution. There are good points for and against both approaches. This is what makes life interesting, you can’t just run the numbers to predict the future.
Linking to an article about a waymo car being hit by a car in a side collision seems pretty disingenuous when comparing to the Tesla accidents that people usually talk about
A camera (or a binocular camera setup) could only ever return depth planes. Lidar returns true 3d forms. Also, depth plane evaluation from cameras is not 100% reliable. Things like reflections, glare and suchlike screw it up.
This is nonsense. There is no difference between the data that binocular cameras return and LIDAR, other than that LIDAR is more reliable and accurate. Your distinction between "depth planes" and "true 3d forms" is gobbledygook.
Maybe I did not express myself accurately enough. I work in VFX, and my language is not always as precise as it could be. But certainly for us this is the difference in application.
Using a two camera input, the best we could hope to get was a depth map. Typically, it is reliable for at most one or two reliable levels of depth, useful for foreground/background separation. Maybe also a crude depth map, useful for fog. With Lidar we could all that plus normal information (i.e. face orientation).
However, I concede your point. The core difference is one of accuracy.
A fully autonomous driving system that was only as good as a human would not gain much traction and the company would be on the hook for some serious legal damages if it became popular.
But arguably very little progress (none?) in technology, automation, and industrialization has been through dogmatic replication of biological systems.
Yeah basically LIDAR is better than vision for working out the 3D structure of the world. You can do it from vision, but it's really really hard.
We don't have a working driverless car with nice easy LIDAR data - it would be really silly to try and make one the hard way first, using only vision data (yes I know about Telsa).
Currently it's very difficult to reason about object geometric dimensions / movement from video alone. We (humans) are decent at it and we still make mistakes. "The right ML algorithm" for this is basically a problem as hard as achieving singularity.
Picture this scenario:
While looking at a TV screen; what is the difference between a photograph of a car (on the screen) vs a real car (on the screen)?
The only thing a camera can offer is input for pattern recognition. Lidar/Radar offers context.
I'm not suggesting that all you need to fool computer assisted driving is a picture of an empty road taped to the front of the camera; but if the computer can't tell the difference between an optical illusion and reality then I think they need more data inputs.
We don't need computers to be able to handle white-out blizzard driving conditions, but that 'rare' occurrence perfectly illustrates the singular limitation of a camera-based pattern-recognition system.