How Adversarial Attacks Work

ChuckMcM · on Nov 2, 2017

This weakness is one that I think will plague self driving cars as sign recognition will be key and without some ability to insure that they cannot be dangerously fooled, it will be hard to get them certified. The canonical example is to make a no left turn sign recognize as a no right turn sign and have the car go the wrong way on a one way street.

Clearly there is a marketing opportunity for t-shirts that make you recognize as other things. Who doesn't want to show up in an image search for toasters on Google Images ? :-)

But my current best guess on how this issue will be addressed will be with classifier diversity an voting systems. While that just moves the problem into a harder and harder to synthesize data set (something that not only is adversarial to one classifier but gives the same answer on several), I believe it will get us to the point where we can trust the level of work to defeat them is sufficiently hard to make it a non-threat.

brudgers · on Nov 2, 2017

Based on the history of artificial intelligence, I am less optimistic about the long term prospects of any silver bullet technology, and that's how neural networks and related systems are currently positioned. Ultimately, rule based systems are rule based systems, our inability to articulate the explicit rules should not be mistaken for their absence. Ten or one hundred or a thousand rule based classifiers voting is more of the same, not something different. The black box still has the same inputs and the same outputs no matter how many times the handle of the Pumping Lemma is raised and lowered.

azernik · on Nov 3, 2017

Or, simply with better machine learning.

> The border between “truth” and “false” is almost linear. The first cool thing we can derive from it is that when you follow the gradient, once you find the area where the predicted class changes, you can be fairly confident that the attack is successful. On the other hand, it tells us that the structure the decision function is far simpler that most researchers thought it to be.

Humans can be fooled by optical illusions as well; but those illusions are much more limited and much more noticeable than most of these. My (very non-specialist) interpretation of the italicize clause is that a) vulnerability to these attacks is a continuum, not a binary, and b) the current ease of these attacks reflects the crudity of our current ML techniques.

theptip · on Nov 3, 2017

Agree, it seems to me that there's a sense in which this is a trivial issue; just throw more adversarial examples into the training set, and turn the crank until the classifier understands how to correctly handle those cases too.

There's also a sense in which this is extremely non-trivial; any agent (human or machine) can be subjected to adversarial attacks. They just look different right now, and our current systems are vulnerable to very simple ones.

It seems to me that improving an algorithm's resistance to adversarial attacks is much more feasible than improving a human's resistance to their own class of adversarial attacks.

ben_w · on Nov 3, 2017

Habe you seen the Monroe-Einstein illusion? There is a continuous transition from one image to the other, depending on the angular size. This feels relevant to your point.

https://static.independent.co.uk/s3fs-public/styles/article_...

tambourine_man · on Nov 3, 2017

I wouldn't call that an illusion, but more a consequence of a high-pass/low-pass filter at different distances with limited resolution.

What I find much more disturbing is that even though I measured and known those lines are parallel or that those shades of gray are the same, I can't "unsee" the illusion.

candiodari · on Nov 4, 2017

I have heard from the police in many places that humans have the same problem, and they have it in practice:

There are places in almost every police zone where people driving misinterpret the situation almost perfectly consistently. Which means in some cases that the exact same accident keeps happening with a certain regularity, mostly just involving cars (and generally in that case nothing's done about it), but sometimes involving bikers or pedestrians and in those cases usually after a while the roads are changed so it stops happening.

So here we are:

1) there are "adversarial attacks" on the human mind, in existing road situations

2) those attacks were built into our road network, by humans, and presumably accidentally

3) they systematically lead to accidents and cause deaths, mostly people who weren't even the human misinterpreting the situation. Also, a lot of economic damage is caused.

4) we do one of two things to fix it:

a) we just accept this as a fact of life and don't care

b) we change the road situation until we purely accidentally hit one that doesn't get misinterpreted.

So how to deal with this for AI drivers ... I know ! How about the exact same way ?

AI drivers, I've been stuck behind them in MTV traffic enough to know this, are far safer than good human drivers. They far exceed the ability average human drivers. And they wipe the floor with the very best humans where it comes to patience with fellow road users.

Why do we hold them to a 100% standard ? Nobody and nothing, human or otherwise, matches up to a 100% perfect standard. A stick you use to beat a dog will not have a 100% success rate, once every 10 years or so that stick will break and maybe even injure the guy holding the stick by bouncing around, and yet somehow that is acceptable ...

We need to talk about how good drivers need to be. 100% is simply not an acceptable answer.

projectileboy · on Nov 2, 2017

One possible mitigation could be the use of geocoded roadway metadata. I rented a car in France that had this - it would let me know when I was exceeding the speed limit based on my position. Not hyper-accurate, and so probably not a replacement for all road signs, but certainly good enough for many situations.

smogcutter · on Nov 2, 2017

Was thinking of this too, but what happens when the map and "ground truth" disagree? The machine would need to be able to identify the rare circumstances when it should trust the map over its senses. If it's capable of that, this wouldn't be a problem in the first place.

SomeStupidPoint · on Nov 3, 2017

When driving, that probably doesn't really matter:

If the map and road sign disagree, just do what everyone around you is doing (ie, go with the flow) until you recover agreement. It's not perfect, but stretches out attacks to having to compromise long stretches of road and multiple vehicle types. If the cars for the next mile somewhere can find pace, the whole group can; similarly, if any kind of car can find pace, they can all use that to set.

Obviously safety checks and such, but human drivers only loosely follow signs and often get behavioral cues from other cars -- why can't automated cars do the same when they get confused?

dTal · on Nov 3, 2017

I am now picturing a mile-long phalanx of self-driving cars gaily careening off the end of a destroyed bridge because the map and road signs disagree and they're all "going with the flow". Or to put it another way, if all your friends drove off a bridge, would you?

f137 · on Nov 3, 2017

What would you do if confused by the road signs? A car could 1) see what other drivers doing 2) ask them - and this is easies with v2v 3) ask some central authority owe the air

skybrian · on Nov 3, 2017

Alternately, if you have multiple models that should normally give the same result, this can apparently be used to automatically find test cases where they differ:

DeepXplore: automated whitebox testing of deep learning systems https://blog.acolyer.org/2017/11/01/deepxplore-automated-whi...

Houshalter · on Nov 2, 2017

I mean people can already do that to human drivers. You can easily remove a stop sign. Or put a sticker over an important sign. Any human driver will be fooled! Yet this doesn't seem to be a prevalent problem.

ChuckMcM · on Nov 2, 2017

And they have: http://www.nytimes.com/1997/06/21/us/3-are-sentenced-to-15-y...

It just makes it harder to catch who did it.

stonewhite · on Nov 3, 2017

Not relevant to the discussion, but turns out these kids didn't steal those signs.

https://www.law.umich.edu/special/exoneration/Pages/casedeta...

ChuckMcM · on Nov 3, 2017

Interesting to read that follow up. The kids had stolen 21 signs, but not those. Thanks for the link.

Houshalter · on Nov 2, 2017

It's also harder to do. It requires technical knowledge and some planning, not some drunk teenagers.

sandGorgon · on Nov 2, 2017

Can such methods be used to "fingerprint" proprietary datasets by tainting them ? For example, i want to make sure that my dataset is not stolen and used by someone else (Waymo?). So I taint it using an adversarial method and create a "canary test set" that will uniquely identify if my dataset has been used in some training.

haraldurt · on Nov 2, 2017

In one sense, no. You can guarantee privacy of any given input (or any subset of k inputs) by applying transfer learning of an ensemble of models trained on subsets of the training data [0][1]. This is useful if, for instance, you train on medical data and you don't want anyone to know that "John Doe, HIV+" was part of the input. If your adversary does not take such precautions, however, then your canary should work.

[0] https://arxiv.org/abs/1610.05755

[1] https://static.googleusercontent.com/media/research.google.c...

alexcnwy · on Nov 3, 2017

I believe google maps has fake streets as a canary for their maps being scraped...

cscheid · on Nov 3, 2017

yup, in cartography these are known as trap streets

ovi256 · on Nov 3, 2017

You don't even need an adversarial attack method to fingerprint or watermark a proprietary dataset. If the dataset consists of images, you could just watermark them, or mark them using steganography. The watermark would appear as random (non-targeted) noise, and would be mostly invisible to current classifiers. Please note that the noise employed by adversarial attacks is very different, it is highly targeted.

scrooched_moose · on Nov 2, 2017

Is there any reason to think this would work at all in the real world? All of these "attacks" require complete control of the image being fed to the classifier.

In the ATM example you don't directly load an image of the check to the computer inside the machine. You design a check in photoshop, add the noise, print it out, feed it to the machine, which takes a picture of the check. Mobile bank apps still require you to take a picture of the check so you don't have enough control there either.

Similarly in the road sign example; the lighting, angle between car and sign, dirt on the sign, etc all mean the car sees a much different image than you designed.

I'd think all of these steps mean the classifier gets a dramatically different image than you intend and the attack fails. There's maybe a vanishingly small probability it works when the stars align, but that could be easily mitigated by taking multiple consecutive images and looking for an odd results.

anishathalye · on Nov 2, 2017

Goodfellow et al. demonstrate that adversarial misclassification works in the physical world: https://arxiv.org/abs/1607.02533 -- a printed image is consistently misclassified (demo here: https://www.youtube.com/watch?v=zQ_uMenoBCk)

Recent work by me and some friends demonstrates that physical-world adversarial examples can actually be made quite robust, and you can synthesize 3D adversarial objects as well, and make them consistently classify as a desired target class: http://www.labsix.org/physical-objects-that-fool-neural-nets...

scrooched_moose · on Nov 2, 2017

Thanks for the links and great work! I hadn't seen any research on it making the jump to the real world yet.

mikepurvis · on Nov 2, 2017

I wouldn't have believed but for those videos. Fascinating stuff.

ronald_raygun · on Nov 2, 2017

Another example where they 3d printed a turtle to fool classifiers

https://twitter.com/daniel_bilar/status/925541876484231168

not_that_noob · on Nov 2, 2017

Slick! 84% success rate in the real world, and a simple clever technique. Basically use another DNN to reverse-engineer the target, find weaknesses using the substitute and then use those examples to make attack vectors. Without a robust mathematical framework to understand why a DNN behaves as it does, this is almost impossible to guard against.

From the abstract: "Machine learning (ML) models, e.g., deep neural networks (DNNs), are vulnerable to adversarial examples: malicious inputs modified to yield erroneous model outputs, while appearing unmodified to human observers. Potential attacks include having malicious content like malware identified as legitimate or controlling vehicle behavior. Yet, all existing adversarial example attacks require knowledge of either the model internals or its training data. We introduce the first practical demonstration of an attacker controlling a remotely hosted DNN with no such knowledge. Indeed, the only capability of our black-box adversary is to observe labels given by the DNN to chosen inputs. Our attack strategy consists in training a local model to substitute for the target DNN, using inputs synthetically generated by an adversary and labeled by the target DNN. We use the local substitute to craft adversarial examples, and find that they are misclassified by the targeted DNN. To perform a real-world and properly-blinded evaluation, we attack a DNN hosted by MetaMind, an online deep learning API. We find that their DNN misclassifies 84.24% of the adversarial examples crafted with our substitute. We demonstrate the general applicability of our strategy to many ML techniques by conducting the same attack against models hosted by Amazon and Google, using logistic regression substitutes. They yield adversarial examples misclassified by Amazon and Google at rates of 96.19% and 88.94%. We also find that this black-box attack strategy is capable of evading defense strategies previously found to make adversarial example crafting harder."

fenwick67 · on Nov 2, 2017

Wow this is incredible

blauditore · on Nov 3, 2017

It seems like all adversarial examples I've seen so far are enabled by networks overfitting to small-scale features. By overfitting I mean that they give more importance to details compared to humans, who recognize much more the image as a whole.

Maybe the problem could be mitigated by penalizing "direct" responses to small-scale features while training, which is not a trivial thing to do though. One approach I could think of is training with multiple altered versions of the image, e.g. various amounts of blur, noise and mean/median filters applied. Or the other way around: To be more confident about a result, scale down the image to a fraction of it's size, run detection on that and compare results.

Are such techniques in use, or being researched on? I'm not too much in the loop about those topics.

rahimnathwani · on Nov 3, 2017

"training with multiple altered versions of the image, e.g. various amounts of blur, noise and mean/median filters applied"

"Are such techniques in use"

Yes, it's called 'data augmentation': https://medium.com/towards-data-science/deep-learning-3-more...

joe_the_user · on Nov 2, 2017

"Recent studies by Google Brain have shown that any machine learning classifier can be tricked to give incorrect predictions"

-- That has to be an overly broad statement (edit: it might true if you say "neural nets" or something specific instead). I would assume they mean any standard deep learning system and maybe any system that is more or less "generalized regression" but that couldn't be "any machine learning system", I mean one could imagine a deterministic model that couldn't be "tricked".

Plus a link would be good (it's not around the quote, I don't know if it's elsewhere in the article).

alexbeloi · on Nov 2, 2017

>I mean one could imagine a deterministic model that couldn't be "tricked".

The models in reference are deterministic, most models for classification/regression are deterministic.

Adversarial attacks are showing that the models are chaotic (in the dynamical systems sense), they're very sensitive to their inputs.

edit: Not all models have this issue, it's been shown that all the typical image based convolutional neural networks are susceptible to this issue. My guess is that it's a more general problem of high dimensional inputs.

xapata · on Nov 2, 2017

We've known for decades that NNs are prone to overfit. This is just another example of it. With great variance comes great ... uh ... susceptibility to unusual future inputs?

joe_the_user · on Nov 2, 2017

Yeah, my complaint is that they're using the generic phrase "machine learning" for specific deep learning and other neural net methods. Certainly, a wide class of "black box" algorithms can be fooled, mostly because they extrapolate in surprising and paradoxical ways.

bertil · on Nov 2, 2017

It could be this: http://www.labsix.org/physical-objects-that-fool-neural-nets...

It is impressive, but that came out a couple of days ago and is not reviewed yet, as far as I can tell.

It’s not obvious from the URL but two of the authors were interns at Google, which could explain why they describe some aspects of InceptionV3 as white-box (because it's not clear if they have re-trained it themselves).

anishathalye · on Nov 2, 2017

It's probably referencing Szegedy et al. 2013 (https://arxiv.org/abs/1312.6199).

zitterbewegung · on Nov 2, 2017

No system is 100% secure. What they mean is you can train an adversarial input that would work with a deep learning system and this can fool other machine learning systems. Or that you can create a deep learning system that would fool another (you use a GAN to make forgeries to fool your target ).

j2kun · on Nov 2, 2017

> No [machine learning] system is 100% secure.

That is what's being claimed without proof, when they meant "existing neural network systems are known to be insecure"

LolWolf · on Nov 2, 2017

Sure, you can't claim that every machine learning system is susceptible to this attack (here's one that isn't, a model that always returns the same class on all inputs).

But (apart from obvious trivialities) I suspect this is a more general problem with interesting properties—the claim the GP is making boils down to something like: a non-trivial[0] machine learning model with large parameter space is, with vanishing probability (as the number of distinct training samples increases), robust to classes of eps-bounded attacks.[1] This seems like a provable claim that is well-defined and (in a weird, handwavy 'intuitive' kind of way) likely to be true... there are just way too many parameters and too much uncertainty in the local minima that we reach when training from such classes of examples to have 'robustness'.

Perhaps I'm totally wrong, though, and someone will come up with a regularizer that prevents all of this from happening, but this seems highly not obvious to me, at first glance.

---

[0] Non-trivial means that, say, it has non-vanishing curvature (Fisher information, assuming the model returns a vector of probabilities) almost everywhere. I hate to be so nitpicky, but I suspect someone would soon comment asking for definitions of all of these things and that the problem isn't 'well-defined.'

[1] Given a (large enough) set of inputs, the max-norm difference between at least one input and the adversarial example must be ≤eps. In other words, if we want the model to misclassify 'turtle' with 'dog', we don't just give a dog picture---the adversarial picture must look like a turtle in a mathematical sense.

comex · on Nov 3, 2017

That’s a very strong claim, though, given that we’re explicitly talking about hypothetical future systems as opposed to “existing neural network systems”. After all, taking the definition of “machine learning” to the extreme, it could include a system based on simulating an entire human brain. So the statement would necessarily imply that either:

(a) brain simulation is fundamentally impossible for some reason (insert your favorite argument for why this should be the case), or

(b) epsilon adversarial examples exist that fool humans (we just haven’t found any).

Personally, I think (b) is more likely than (a), but it seems much more likely that neither is true: that the human visual system is resilient at least to some extent (...for some epsilon), and could hypothetically be simulated by a machine. It would stand to reason that building a resilient system doesn’t require simulating a human, either, or nearly as much computational power as that would take (given that emulation is inherently very inefficient, and that visual processing only takes up a part of the human brain and seems to be done pretty well by animals with simpler brains). But it might require much more computational power than we have today…

On the other hand, if you limit computational power to around today’s level, and consider only potential architectural changes, the statement seems more plausible to me.

LolWolf · on Nov 4, 2017

Not only do I think (b) is more likely than (a), I think (b) is strictly true. I mean, our concentration is quite limited and there are optical illusions which are, perhaps even more strongly than eps-adversarial examples for a given architecture, very general problems. So, given a "brain-simulated ML" thing (whatever that may mean, but let's assume it's the case), I suspect there may be a way of coming up with even more striking adversarial examples—but, considering that architectures vary heavily between humans, I suspect this would be quite person-specific.

Anyways, my final point is in saying that such a claim doesn't seem too far-fetched in any way.

That being said: I could be totally and hilariously wrong.

j2kun · on Nov 3, 2017

Unfortunately, it is known that many classes of problems can be learned with adversarial manipulation of an epsilon-fraction of the training data. See for example the paper "PAC learning with nasty noise," though there are many other adversarial noise models in which resilience can be proven (though proving it even for a specific concept class would be enough to prove it wrong that "all machine learning systems are vulnerable").

LolWolf · on Nov 4, 2017

But the claim the paper is making fits perfectly with the above:

> On the negative side, we prove that no algorithm can achieve accuracy of ε < 2η in learning any non-trivial class of functions.

Though this is for a very particular type of adversary which has a lot more power than I'm claiming. In my case, though, I'm strengthening the side that the adversary has a very large number (e.g. infinite, as my claim "N->infty") of possible examples in the training set to be eps-close to.

---

Rereading the above: Sorry, I'm not quite sure I understood your point! Are you claiming that there is such a model which is resilient to adversarial examples, and also has a large number of parameters relative to the hypothesis class size? (e.g. say VC dimension?) In that case, my claim would be definitely false, but I have yet to see such a paper (if you have a reference, I'd love to check it out!)

shmel · on Nov 2, 2017

Really? I believe they meant exactly what they said. Read https://arxiv.org/abs/1605.07277

Adversarial examples can be easily crafted for linear regression, SVM, decision trees and k-NN at very least. I can't think about any ML technique that is proven to be secure against them.

There is also an old blogpost from Karpathy describing that the only reason we are not concerned about adversarial examples in linear models is that nobody uses them to classify ImageNet.

j2kun · on Nov 3, 2017

Yes, many models are vulnerable. But there are no proofs that allow you to say that. You should also be aware that they're only talking about image classification in all of these.

I will only accept blanket statements about "all machine learning systems" when there's a mathematical proof.

I don't mean to lessen the importance of the work, just to point out that saying "X fails at Y on subtask Z using the techniques we have today" is very different from saying "All possible X's, current and future, fail at Y."

joe_the_user · on Nov 2, 2017

"Machine learning" is a broader category than "deep learning", just saying.

obastani · on Nov 2, 2017

I think they might be referring [1], which shows that adversarial examples transfer to models such as decision trees and SVMs.

[1] https://arxiv.org/abs/1605.07277

xstartup · on Nov 2, 2017

Well, this is getting more attention because it's important for the "ad network moderation". If Google/FB fail at moderation using these methods, they'll have to HIRE lots of humans to do it for them, which often involves contracting it to outsources like Accenture. This will put downward pressure on their billion dollar revenue. Humans are expensive but vastly effective at ad moderation.

trott · on Nov 2, 2017

Very recently, it was found that changing a single, carefully placed, pixel is enough to confuse NN classifiers. https://arxiv.org/abs/1710.08864

dontreact · on Nov 3, 2017

For 32 by 32 images

shpx · on Nov 3, 2017

1/(32*32) = 0.1%

acallahan · on Nov 3, 2017

Obscurity seems like useful security here. IIUC it shouldn't be possible to e.g. trick self-driving cars with noisy signs, unless you have a copy of the classifier to train against. Thinking about ATMs, you could train against it as a black box, repeatedly inserting different patterns of noise? But it seems probably infeasible if you need to do a lot of iterations.

It also suggests that people concerned about adversarial attacks shouldn't use off-the-shelf pretrained classifiers, where attacks can be trained offline in advance. Similar to hashing algorithms and rainbow tables, maybe a practice of "salting" an off-the-shelf classifier could be effective in dodging attacks.

3pt14159 · on Nov 3, 2017

True. Also, just having unconnected systems that use different types of features / heuristics should be enough to at least pull the car over when they wildly disagree over what to do.

neptvn · on Nov 2, 2017

> Apart from the fact that nobody wants to risk having false positives, there’s a simple argument as old as machine learning itself: whatever a human can do, a machine can be taught to do. Humans have no problem correctly interpreting adversarial examples, so there must be a way to do it automatically.

So by all means, the authors (and a vast majority of researchers) seem to be confident that ML/DL is the road to AGI, hence can "solve" human intelligence (given it is computational)? For how long are we gonna drag the adage that mimicking a human (Turing test) is equal to reaching human levels of intelligence?

moxious · on Nov 2, 2017

It is not true that whatever a human can do, a machine can be taught to do. The human must have insight into HOW they do it in order to teach it, or otherwise come up with some new algorithm. There are a large class of things humans do that they don't understand the mechanics behind, and for which there also aren't algorithms.

I'm not talking empathy or philosophy. How about just folding laundry. Not just one type, not in a controlled environment, but folding any laundry anywhere.

IanCal · on Nov 3, 2017

> It is not true that whatever a human can do, a machine can be taught to do. The human must have insight into HOW they do it in order to teach it, or otherwise come up with some new algorithm.

Why does this mean a machine cannot be taught to do it?

> There are a large class of things humans do that they don't understand the mechanics behind,

Sure.

> and for which there also aren't algorithms.

Well, we manage to encode it in our brains.

Pedro9009 · on Nov 3, 2017

It is not true that whatever a human can do, is the correct way to look at it. The fine motor skills and insight needed for many tasks will always be beyond that of a machine. A machine also has a great deal of trouble with adapting to things popping up that it has to deal with.

zardo · on Nov 2, 2017

>Humans have no problem correctly interpreting adversarial examples, so there must be a way to do it automatically.

Not true, we just have different priors and are fooled in different ways. See stage magic, optical illusions, etc...

throwaway613834 · on Nov 2, 2017

> Not true, we just have different priors and are fooled in different ways. See stage magic, optical illusions, etc...

But we can often recognize them as such, which is important. Actually, on that note, is there work on making machines being able to recognize magic tricks?

ben_w · on Nov 3, 2017

> But we can often recognize them as such, which is importan

Can we? The occult, new age, and religious sections of bookstores suggests otherwise, as do paid horoscope readings, homeopathic medicines, un/lucky numbers (housing and lottery), and shell-game scams.

soundwave106 · on Nov 3, 2017

All of those (along with stage magic and illusions) have plenty of material (in bookstores and elsewhere) describing either the mechanics, the long odds, or debunking them largely as pseudoscience and/or scams. So it's clear that some people at least can recognize them.

I think it's also possible for humans to be 100% aware of the "adversarial attack", and still use these types of mediums for light entertainment. This seems to describe many people who occasionally buy lottery tickets for entertainment, and would probably apply to many who attend and produce modern stage magic shows.

(In fact, I notice that some of the top modern crusaders against con artists who do use illusions and paranormal / occult claims for adversarial reasons are stage magicians themselves -- James Randi, Penn and Teller, Derren Brown.)

ben_w · on Nov 3, 2017

“Some” is not ”often”. If there are any illusions that affect all humans, then by definition we cannot give them as examples because nobody realises they exist.

Yes, it is possible for people to know lottery odds and still play for the excitement. This does not invalidate the claim that many play the lottery with the expectation of winning, nor that people choose numbers superstitiously.

aidenn0 · on Nov 2, 2017

ELI5: Why are adversarial attacks not preventable by adding unpredictable noise to untrusted inputs?

Houshalter · on Nov 2, 2017

The attacks are resistant to noise, or at least can made to be so. If every single input is tweaked in exactly the right direction, noise won't undo that. Most inputs will still be pointing in the adversarial direction. The noise will move some inputs back to their original position, but others will be pushed even further into adversarial territory.

aidenn0 · on Nov 3, 2017

Thanks, that makes a lot of sense.

SNIPERBABY · on Nov 3, 2017

Good to know, thanks!

WalterSear · on Nov 2, 2017

Because they aren't noise: they just seem like noise to humans.

If you add in noise, then you have to train the network to disregard that noise. And the adversarial input will then be features that occur above this noise floor you ignoring.

d--b · on Nov 2, 2017

Research is needed in that area, but surely adding some noise / blurring to sample during training and during recognition should help reduce the feasibility of the attack. Isn't that the case?

Adversarial attacks seem to point to overtraining in some sense.

shmel · on Nov 3, 2017

Unfortunately, it is not that simple. There are multiple ways to generate adversarial examples and simple defenses help at most against the simplest attacks. Every now and again a new preprint "one simple trick defeating the latest adversarial defense technique" appears on arxiv.

I believe they are linked to the nature of DL (and ML in general) models. We try to capture very tiny manifold of natural images in the space of all possible images. We found a technique that does it well (CNNs). But by the very definition of its training we train it to output some values in a finite number of points. In the same time CNN's output outside of those points are mainly defined by the model's smoothness. We can expect that in a neighborhood of a given image the output will be roughly the same (it allows them to generalize). Far from any point of training set the model can say anything. Adversarial examples basically hint that this smoothness work well only along natural images manifold, once we step outside it is much more chaotic. Or, equivalently, the neighborhood where the CNN gives roughly the same output is a very thin "slice" that closely follows natural images manifold. Why is that? Probably there are some non-trivial topological reasons. But it is exactly what nobody understands now.

amelius · on Nov 3, 2017

Am I the only one who finds the term "Adversarial Attack" a bit funny? See [1].

[1] https://en.wikipedia.org/wiki/Pleonasm

barrkel · on Nov 3, 2017

ISTM that many of the classifiers get a lot of their signal from detecting textures, and that the adversarial noise works by superimposing a different texture on the image.

xstartup · on Nov 2, 2017

There is a huge industry for folks who are finding ways to bypass Facebook's/Google's ad moderation. They pay you if you craft ads as per their liking and it approved.

naveen99 · on Nov 2, 2017

Don't you need access to the classifier internals to train the adversarial network ? Nobody is going to publish the network weights for a check reading machine...

Winterflow3r · on Nov 2, 2017

No, this paper by Papernot et al shows how to do blackbox attacks without knowledge of model internals. https://arxiv.org/abs/1602.02697

naveen99 · on Nov 2, 2017

Thanks. But they still need to be able to use the black box a brute force number of times on shady inputs and get detailed outputs during their gradient descent. Not going to be allowed on a check reader or anything sensitive.

haraldurt · on Nov 2, 2017

Not necessarily. Adversarial examples have been shown to, for instance, be transferable across different networks with different hyperparameters (e.g., number of layers) trained on disjoint subsets of a training set [0, section 4.2]. There are more references from the paper linked by the OP.

[0] https://arxiv.org/abs/1312.6199

naveen99 · on Nov 3, 2017

Thanks. I wonder if adversarial training helps prevent overfitting too. Could you use adversarial training to beat alphago ?

yorwba · on Nov 3, 2017

You could not, because AlphaGo is not a classifier (so it isn't well-defined what an adversarial example is) and the input space is discrete (Go board state) and you can't do ε-small perturbations (two different states differ by at least one stone).

samat · on Nov 3, 2017

Do this attacks still work if you don’t control input image bit-by-bit, but rather feed it via digital camera? While using any digits camera for input in the real world there will be some blurrines and distortion. Is it possible to construct such a noise that 1) will survive camera input 2) will be undetectable with human eyes 3) will fool a ML algorithm?

I do believe you are not able to control input image bit-by-bit in many real-world scenarios.

taejo · on Nov 3, 2017

As has been posted elsewhere in this thread, you can 3D print objects that are consistently misclassified when viewed through a camera from any angle: http://www.labsix.org/physical-objects-that-fool-neural-nets...

eroccatlun · on Nov 2, 2017

Does the adversarial attack require access to the model making the predictions?

It seems like the attack relies on `doping` the input with features present in a different target, or by masking features of the existing target.

Could you so specifically attack a target without knowledge of its features?

dontreact · on Nov 3, 2017

Was there a follow up to this paper? NO Need to Worry about Adversarial Examples in Object Detection in Autonomous Vehicles https://arxiv.org/abs/1707.03501

bartkappenburg · on Nov 3, 2017

I tried the example from the blog post (Stallone <-> Reeves) in Rekognition (AWS service) and it came out correct:

https://imgur.com/a/C0s05

dontreact · on Nov 2, 2017

I don’t buy any of the attacks listed here or see how the examples being imperceptible is actually a factor.

If you have the ability to modify the check why not make it actually look like it’s for a 1000000 dollars (ie even to a human).

If you are going to go out and replace speed limit signs to fool self driving cars, it’s probably equally dangerous whether or not the change is obvious, because if it’s way out of bounds a car won’t do it but if it isn’t then it would also fool humans.

Anyone have some more realistic attacks that are specifically possible due to the imperceptible nature of the changes in the input? Most of the harm I’ve heard in examples simply comes from the fact that if you are controlling the input to an ml system, out can get it to do whatever you want even without using any of these techniques: just actually change the class of the input.

maltalex · on Nov 2, 2017

> If you have the ability to modify the check why not make it actually look like it’s for a 1000000 dollars (ie even to a human).

Humans are harder to fool. But some bank apps allow you to deposit a check by photographing it. Such apps would be fairly easy to attack.

If you limit yourself to making 100$ checks become 1000$ checks and not 1,000,000$ checks, you might even get away with it.

c22 · on Nov 2, 2017

A few years back a friend of mine was depositing a $300.00 check at Chase. When the teller asked him how she could help he said "Oh, just depositing this three thousand dollars." She punched in the deposit for $3000.00 and gave him a receipt. It was corrected within 24 hours but the receipt and printout of his bank statement was a great conversation piece. I doubt they would have let him just walk away with the money, nevermind multiple times.

dontreact · on Nov 2, 2017

You have to type in a number. So if you are going to type in 1000, it would probably more likely be a successful attack if you actually forged the check so it looked like it said 1000, so again not seeing how an adversarial example is helpful here.

squeaky-clean · on Nov 2, 2017

> If you have the ability to modify the check why not make it actually look like it’s for a 1000000 dollars (ie even to a human).

Plausible deniability? "I don't know why it took $1 million for a $100 check, look the check clearly says $100 on it, I didn't edit it. Must be a bank glitch."

rtkwe · on Nov 2, 2017

All the deposit systems I've used recently still require you to enter the deposit amount so if you're entering 1000000 for a check that 'clearly' says 100 you lose all plausible deniability because you're actively lying to the bank at that point.

throwaway613834 · on Nov 2, 2017

Huh, I'd always wondered why they ask you to enter the number. Thanks for pointing this out!

rtkwe · on Nov 3, 2017

Yeah it's a combo of that and also just a check on the CV that reads the checks. Mainly it's the latter since it'll be really rare for people to maliciously enter an incorrect amount.

pdeburen · on Nov 2, 2017

While it's true that these attacks work well on state-of-the-art models, there are defence strategies such as including adversial examples during training. Advanced defence strategies such as https://arxiv.org/abs/1705.07204 are robust to a wide array of attacks and achieve very competitive error rates.

I'm not saying it's not a problem but there are successful defence strategies already in place for many attacks.

LolWolf · on Nov 2, 2017

> Although our models are more vulnerable to white-box FGSM samples compared to the v3adv model, ensemble adversarial training significantly increases robustness to black-box attacks that transfer FGSM samples crafted on the holdout Inception v4.

So, I just train my new adversary on the "new" model that was trained on the previous adversarial examples. And now we're back to square one.

I suspect the problem of adversarial attacks is a problem of high-dimensional spaces, not of training on particular samples.

pdeburen · on Nov 2, 2017

I'm not sure how the quote supports your argument. Adversial examples generalize well accross many different classifiers.

Shallow NN's can be fooled just as well, it seems to be more of a problem of linear models in general. Apparently Geoff Hintons Capsule Networks are more robust due to being "less linear" (Ian Goodfellow mentioned this in a recent talk, don't have the references now to back it up)

LolWolf · on Nov 3, 2017

I'm not sure it's about how shallow the network is; even logistic regression can be fooled by the same techniques (e.g. 1-layer NN). That being said, maybe it does have something to do with linearity (I suspect not) or maybe it's just generally harder to deal with nonlinear functions.

erikb · on Nov 3, 2017

I see an SaaS opportunity here, that takes your photos, adds noise, and only then uploads it to social networks.

lazugod · on Nov 2, 2017

Adversarial attacks sound useful. How can we make it illegal to prevent them?

matt4077 · on Nov 2, 2017

I may not completely understand the question, but I'm pretty sure "Intentionally fooling an algorithm for nefarious purposes is illegal" is both the answer, and the status quo.

breakingcups · on Nov 3, 2017

> How is this possible? No machine learning algorithm is perfect and they make mistakes — albeit very rarely.

Right.

80x86 · on Nov 2, 2017

ATMs using neural nets - lol - nope. That is a stretch.

genericpseudo · on Nov 2, 2017

Effective counterexample: the USPS.

http://www.cedar.buffalo.edu/~srihari/talks/Telcordia.pdf

conorcleary · on Nov 3, 2017

The _only_ solution is coming up with a globally-accepted open source base set of instruction for any and all AI or AI-capable code.

yters · on Nov 2, 2017

Ah, this is why AlphaGo won't release their source code or models.

If all ML algorithms can be fooled so trivially, this shows the human mind is not an ML algorithm.

maltalex · on Nov 2, 2017

> this shows the human mind is not an ML algorithm.

not necessarily.

Humans have a lot more knowledge about contextual information that today's ML often misses. For every "image" you classify you use a LOT of side data. That side data has also been acquired through learning.

Even if your actually saw a 1,000,000$ check or a 200 km/h speed limit, you'd know it's probably nonsense. Based on your life experience so far, you know that there's no way that mom just gave you a 1,000,000$ check, or that the city decided to turn that small residential street into a racing track. An image classification ML algorithm doesn't know any of that.

no_flags · on Nov 2, 2017

What about the Sylvester Stallone/Keanu Reeves example? It was just a zoomed in head shot. That seems like an example where humans do a lot better without help from any additional context.

munificent · on Nov 2, 2017

I think your logic is:

1. All ML algorithms can be fooled trivially. 2. The human mind cannot be fooled trivially. 3. Therefore, the human mind is not an ML algorithm.

But claim number 2 is clearly wrong. Human minds are trivially fooled. Here's one:

http://www.jimonlight.com/wp-content/uploads/2012/02/Paralle...

This is exactly what an optical illusion is.

schoen · on Nov 2, 2017

Yes, when I read "Humans have no problem correctly interpreting adversarial examples, so there must be a way to do it automatically" in the article I thought "aren't optical illusions adversarial examples for human perception?" and also "maybe we just haven't found the other kinds of adversarial examples that work against human perception because we can't show humans sufficiently large numbers of examples and measure the humans' responses accurately enough".

But the optical illusion example seems very strong because if you say that human perception includes abilities to classify things, there are many examples just like your link that show cases where the classification routinely goes wrong.

It seems possible to me that there are as-yet undiscovered adversarial examples in human perception that we simply don't have any feasible way to search for, and may never be able to construct practical examples of. (It would probably be extremely unnerving to experience one in real life.)

yters · on Nov 2, 2017

Optical illusions are reverse adversarial examples to those who think the human mind is a deep neural network.

danharaj · on Nov 2, 2017

Cool illusion; I figured it out and so will most humans. I don't think the claim that these sorts of illusions are of the same nature as, say, single pixel attacks against neural networks is justified. On the other hand, it is also true that it's not clear whether such attacks on our current crop of primitive neural networks will work on more advanced autonomous systems that regulate themselves.

yters · on Nov 2, 2017

In theory it seems that with sufficient understanding of any ML system, regardless of sophistication, such examples can be constructed. It is essentially the brain in the vat problem. Computers have no access to objective reality, whereas we cannot say the same regarding the human mind without making strong assumptions about the nature of the human mind.

To put this in comp sci language, we can think of DNNs as proofs that certain inputs belong to certain classes. Since no proof system can be both complete and consistent, something outside the proof system can always either provide unprovable examples or adversarial examples.

danharaj · on Nov 2, 2017

I think the proper test is how stable the learning system is in the face of adversarial examples. I have no indications either way that we could, using current ideas, create a system that learns and is autonomous like a human brain while regulating itself from going too far off the rails. An optical illusion won't lead to you thinking you're the king of France, nor do there seem to be basilisks that will actually crash a brain.

It remains to be seen if the same holds for artificial learning systems. Of course! The gap between current learning systems and even the brain of a rodent is vast and we have yet no idea how to get there.

munificent · on Nov 2, 2017

> An optical illusion won't lead to you thinking you're the king of France

I'm not sure what you're getting at here. By definition, optical illusions are perceptual mistakes limited to the optical system.

It is certainly possible for an optical illusion to trick someone into doing something they don't intend to do because they are fooled into believing the illusion. Imagine painting a set of stairs with a disorienting pattern of shading that makes them appear off, leading you to trip and fall down them.

I don't think it's necessarily for a perceptual illusion to hijack your entire belief system in order to be considered an effective adversarial attack.

yters · on Nov 2, 2017

The whole point of these examples is the DNN gives erroneous classifications with near complete certainty. This looks to me like hijacking a belief system.

yters · on Nov 2, 2017

The incompleteness/inconsistency theorem I mention seems to indicate no such ML system is possible. It will either encounter examples it cannot deal with, or it will be deceived. ML can only be guaranteed to work within very limited domains.

YeGoblynQueenne · on Nov 2, 2017

The problem with optical illusions like that is that they are, in their vast majority, made of abstract shapes. Most of them play with our perception of distance and depth - and the majority again work on two dimensions, only.

It's really hard to imagine an optical illusion that makes you mistake objects in the physical world for something else- say, panda for a lawn mower or a car for a pigeon, or something like that.

Note that I don't agree that this tells us anything about whether the human brain (or mind) is "like" a machine learning algorithm. To me this question has about as much meaning as asking if the brain is "like" quicksort.

The physical substrates are so clearly different that the only comparison you can make is on the level of capabilities (say, both are Turing-equivalent etc) not that of actual structures. Like, where's the 1's and 0's in the brain?

munificent · on Nov 2, 2017

> It's really hard to imagine an optical illusion that makes you mistake objects in the physical world for something else- say, panda for a lawn mower or a car for a pigeon, or something like that.

Here's a physical object that makes you mistake an insect for a plant:

https://en.wikipedia.org/wiki/Phasmatodea

romaniv · on Nov 2, 2017

Problem is, mimicry involves copying essential properties of the target object: color, texture, shape, movement dynamics and so on. It relies on true ambiguity. Adversarial examples against neural networks (the interesting ones, anyway) involve a combination of insignificant, seemingly random permutations that only work in their totality. That's a very important difference.

munificent · on Nov 6, 2017

> essential properties of the target object

They are only "essential" properties according to the limitations of your particular perceptual system. To a bee that can see UV light, a stick bug may look entirely different from an actual stick. An animal that hunts by scent would find them clearly distinct.

There is no such thing as "true" ambiguity unless the two objects are actually the same thing in all respects. If they are distinct but appear the same, it's because they are overlapping in some respects but not all.

romaniv · on Nov 7, 2017

> They are only "essential" properties according to the limitations of your particular perceptual system. To a bee that can see UV light, a stick bug may look entirely different from an actual stick. An animal that hunts by scent would find them clearly distinct.

This is a very confused statement. Fist, since we are talking about image recognition we are - by definition - talking about vision in the spectrum that can be captured by a digital camera and encoded in a typical image format. Second, there definitely is such a thing as essential property. It is a matter of correlation with reality, as well as internal consistency. For example, plants are green and have leaves because of the way they use sunlight. So permuting color of all leaves in the picture is fundamentally different from permuting luminosity of some random pixels.

YeGoblynQueenne · on Nov 2, 2017

What romaniv said- it's a different thing.

chiaro · on Nov 3, 2017

A more accurate example in the non-2D realm: https://en.wikipedia.org/wiki/Pareidolia

throwaway613834 · on Nov 2, 2017

> It's really hard to imagine an optical illusion that makes you mistake objects in the physical world for something else- say, panda for a lawn mower or a car for a pigeon, or something like that.

Sure but people do, for example, mistake each others' faces or voices. You don't need to mistake your friend for a lawnmower for it to be dangerous.

Also, for example, I often mishear my own name when someone else is speaking. Might not happen with every name but it does with some.

YeGoblynQueenne · on Nov 2, 2017

Well, people misidentify others when viewing conditions are poor or when they dont' know the other person well. For instance, say my mother was standing in front of me in broad daylight and I was given half a second to look at her face; I really don't see how I'd fail to identify here, unless her face had changed drastically for some reason.

So, to be fair, this too is a different thing than what we're talking about.

jpttsn · on Nov 2, 2017

https://m.youtube.com/watch?v=A4QcyW-qTUg

YeGoblynQueenne · on Nov 2, 2017

Again, this is an optical illusion that fools depth perception. It doesn't make you see the T-Rex as a tea-pot, say.

Different things.

jpttsn · on Nov 3, 2017

Fair point; it's not exactly the same mistake. If humans made those, this discussion would probably never happen.

What I tried to rebut is your first, weaker assertion:

> The problem with optical illusions like that is that they are, in their vast majority, made of abstract shapes. Most of them play with our perception of distance and depth - and the majority again work on two dimensions, only.

This makes me see a moving 3d object when I am in fact looking at a static flat object; and I can't shake it, even if I try.

YeGoblynQueenne · on Nov 3, 2017

OK, but I did say "the (vast) majority".

Also, from what I can tell, this just happens to work better with a meaningful shape (the image of a T-Rex). It has something to do with how we confuse convex with concave shapes and the T-Rex's snout just happens to be very convenient to demonstrate this confusion (e.g., I think an image of a horse would work as well). I think the illusion would also work with an abstract shape- except it would be harder to find a good abstract shape to demonstrate it as clearly as with the T-Rex.

empath75 · on Nov 2, 2017

I know fmri isn't high enough resolution for this, but I wonder if you had real time access to brain state, you could construct adversarial examples like these.

romaniv · on Nov 2, 2017

The comparison in disingenuous. This illusion has nothing to do with image classification and heavily relies on adding misleading context to the object to alter the perception of one of its properties. Most optical illusions are like that. That is not what is being done with adversarial examples.

yters · on Nov 2, 2017

In order for optical illusions to be a counter example, I must believe those lines are not parallel with near certainty. While I perceive them as such, I don't believe they are. The fact that I can know my perception does not match reality is what gives optical illusions their novelty. Thus, the very nature of optical illusions makes them not a counter example.

A true counter example must be similar to making me believe a car is a toaster by manipulating a single cone cell in my eye. Such a possibility is absurd, which leads me to conclude the human mind is not ML.

red75prime · on Nov 3, 2017

> the human mind is not ML

Pooling layers in CNNs throw away lots of information about spatial co-occurrence of features, which leads to a possibility of adversarial images where adversarial features are scattered all over the place and so they don't significantly affect humans' visual processing.

The conclusion should be "Human mind is not that kind of ML".

yters · on Nov 3, 2017

CNNs are the best we have. If they can't deal with adversarial examples, then what can?

red75prime · on Nov 3, 2017

Hinton's "capsule networks", probably. Or any other future architecture which preserves positional information better.

yters · on Nov 3, 2017

It is a problem inherent in any ML method with large VC dimension. The model will overfit, and when it overfits, adversarial examples become possible.

The very nature of ML seems incapable of dealing with this dilemma. You either work within a very constrained domain (low VC) to ensure everything is covered, and thus are unable to fit complex datasets, or expand the domain (large VC) and thus overfit the datasets. It's the bias-variance tradeoff, and is insurmountable.

red75prime · on Nov 3, 2017

Adversarial examples are no longer adversarial in any meaningful sense, when humans are fooled by them too. We don't need to surmount BV tradeoff, we need to make sure that ML models make the same tradeoffs as humans.

yters · on Nov 3, 2017

Evidence that humans are fooled by adversarial examples? Optical illusions and other perception issues don't count, as explained by many others in this set of threads.

red75prime · on Nov 3, 2017

Camouflage. Paintings perceived as real 3d objects for a fraction of second.

yters · on Nov 3, 2017

You think these are equivalent to changing a single pixel and causing a complete misperception of the object which is believed with almost 100% certainty?

red75prime · on Nov 5, 2017

Single pixel attack is an artifact of the network architecture. It is not inherent vulnerability of every ML technique.

j2kun · on Nov 2, 2017

Take care, the original authors were being overly broad when they implied "all ML algorithms" can be fooled in this way.

Not that I disagree with your conclusion. Just the hypothesis.

yters · on Nov 2, 2017

All ML algorithms are either inconsistent or incomplete, which we know a priori. Insofar as ML tries to be complete, it will be inconsistent, as we see with adversarial examples.

matt4077 · on Nov 2, 2017

> Lately, safety concerns about AI were revolving around ethics — today we are going to talk about more pressuring[sic] and real issues.

Nice how this casually demeans peoples' worries. Of course, the (demonstrated) idea that ML algorithm would pick up on, and amplify, discrimination (among other things) isn't "real" to these guys.

Risk-scoring of loan applications actually happens to be one of the few uses of ML in the non-tec sector, and it's incredibly likely that some of them are already denying people's application because they happen to be named "La David" and not "Emil". But maybe the authors just don't consider "ethics" to ever be a "real" problem?

But "change a single pixel and the ATM gives you $1,000,000" is, apparently, a "real" and "pressing" problem.

mLuby · on Nov 2, 2017

Am I the only one who thinks "adversarial attack" is both a redundant and unhelpful name?

- Redundant: Anyone who attacks you is by definition your adversary.

- Unhelpful: According to the article, "designing an input in a specific way to get the wrong result from the model is called an adversarial attack." That sounds much closer to spoofing attack ("a situation in which one person or program successfully masquerades as another by falsifying data" -Wikipedia). For example, a turtle masquerades as a gun by spoofing the machine learning system by changing irrelevant visual details.

tlb · on Nov 2, 2017

Adversarial means something specific in the ML community. https://en.wikipedia.org/wiki/Adversarial_machine_learning