Hey great article, just a few quick questions: 1) In "Becoming a Backprop Ninja"...

Hey great article, just a few quick questions:

1) In "Becoming a Backprop Ninja", dx is never declared. Is it a variable that was set in a previous pass, or is it some value like 1 that depends on its function? I understand how to derive dx1 and dx2 from dx.

2) In the SVM, can "pull" be other values than -1, 0, 1? It seems like it might affect how rapidly the SVM learns.

3) It takes 300 iterations to train the binary classifier, could you add a brief paragraph about why this isn't 30 or 3000, and what has the greatest effect on iterations?

4) In 2D SVM, what is the α function? Are w0^2 and w1^2 an arbitrary way to regularize (like would |w0| + |w1| work as well?) It made me think of least squares fitting or RMS power, so I wondered what the reasoning was behind it.

Many more questions (and a sense of irony because I have learned NNs in the past and have since forgotten) so I am trying to give my brain some breadcrumbs.