Applied math/scientific computing: If hiring someone, I would describe some prac...

ColinWright · on Jan 2, 2014

Interesting - this is exactly the kind of answer I'm looking to learn from. Are these really so utterly fundamental that someone could do them under pressure without thinking?

Is it the case that someone who can't answer these near automatically is, in effect, incompetent (in this field)?

I'm wondering if you've set the bar higher than FizzBuzz is in computing, or if this really is equivalent.

Thanks!

tel · on Jan 2, 2014

I would agree with the op here. SVD in particular is so fundamental to the idea of data analysis I couldn't imagine talking to someone who even flinched at hearing it. Most other things fit at roughly the same level in my mind.

I feel like the mechanics of test selection biases against people who've specialized in Bayesian stuff—which might be very interesting! It's definitely a make-or-break kind of thing as to whether you can correctly formulate all the various moving parts and relate them correctly.

thearn4 · on Jan 2, 2014

For this list, I'm basically assuming that I'm interviewing a candidate for a Ph.D. level (or equivalent) numerical analyst R&D type position. If they don't feel comfortable discussing the basics of these items, then they probably can't do too much more for us than algorithm plug&chug. Which has its value, but not what I had in mind.

That said, I also wouldn't expect someone to write out implementations on a whiteboard under pressure (I'm not a Google recruiter). I'd be more interested in prodding their brain to gauge their general level of understanding, which is (I'm pretty sure) how hiring committees for mathematicians in academia operate. If they have a good foundation, I think it's less important that they've rote memorized implementation details.

darkrho · on Jan 2, 2014

Do you have any example problem to share? I've taken a numerical analysis class and this interview-like questions are appealing.

thearn4 · on Jan 2, 2014

These are rough guidelines based on current best-practices that I know of, and shouldn't be treated as doctrine obviously. Numerical analysis/linear algebra is actually a pretty fast evolving field as far as applied math goes. Though statistics is a bit less dynamic at the moment, I'd say.

Honestly, after a certain amount of time, I'd expect a new hire would be able to teach me what is state-of-the-art in the field based on new literature.

But the questions I'd have in mind would be couched like this:

Linear system:

    - Is it small, square, and numerically well-conditioned? Use LU - it's pretty fast to write and pretty fast to use in practice.

    - Is it small, but rectangular (ie. overdetermined), or not as well conditioned? QR is a good choice.

    - Is it small, but terribly conditioned, or do you want to do rank revealing, or low-rank approximation while you're at it? Will you be using this matrix to solve many problems (multiple right-hand sides)? SVD would fit the bill

   - Is it large, or sparse, or implicitly-defined (ie. you don't actually have access to the elements of the matrix defining the system - you just have a surrogate function that gives you vectors in its range, or something)? Use an iterative algorithm. Krylov subspace methods (MINRES, GMRES, conjugate gradient, etc) are your friends here.

Pattern matching (more specific in question formulation):

    - If you wanted to determine the "strength" of a waveform (in a finite uniform sampling of data) that is re-occurring in a fairly regular way (like arterial pulse in an array of data taken from an oximeter), what type of transformation would you use, and how would you use the resulting information given in transform domain?

   - What if you wanted to determine the strength of a waveform that is short lived/impulse in nature, but re-occurs without any known periodicities (eg. eye blinks artifacts in a sample of EEG data)

   - How would your answers to the above questions change if there are n separate channels of data collected simultaneously (ie. sampled in different locations), which may be analyzed together?

Statistical analysis (might seem vague, I'd be more interested in good discussion with a candidate here than in actual whiteboard writing):

    - What does statistical significance mean, in the context of decision making? Is it a property of the test you perform, or is it a property of your data? (sort of a trick question, this basically rehashes the Fisher vs. Neyman & Pearson debates of the 20th century stats community)

   - Some cod problems of when to use z, t, f tests. Basically, you use them when your situation matches the appropriate inference models (two normally-distributed sample comparison of means with identical variances? t test.)

   - How do you construct an optimal test from scratch, if one doesn't already exist for your particular situation? (basically, if minimize type II error with fixed type I makes you comfortable for your problem, can you do use the Neyman-Pearson lemma to do likelihood ratio construction correctly)

   - What does a p-value actually mean? What if you instead wanted to have actual probabilities for your hypotheses, or you had apriori information that you wanted to use? (Bayesian inference is the winner here)

   - Probably something from point estimation: least squares, minimax criterion, Bayesian MAP estimates, general model fitting, those sort of thing. Brings it all back around to numerical (where I'm most comfortable). Like how applying statistical ridge regression is just knowing how to code up Tikhonov regularization, when you get down to implementation.

darkrho · on Jan 2, 2014

Thanks for such extended reply!