One of the parallelized versions is Gibbs sampling that is used for sampling fro...

stellographer · on June 1, 2015

Are you sure that Gibbs sampling isn't just the multivariate version of MH?

What I'm saying is that convergence speed for MH is limited by the fact that guesses cannot communicate with each other... which doesn't matter when you have a pencil and a 4 function calculator like when it was designed.

A genetic algorithm or a particle swarm algorithm is capable of much swifter convergence because the guesses _can_ communicate and influence the direction of the drunken walk.

physicsyogi · on June 1, 2015

When I was in grad school, we used MH to compute 400-dimensional integrals. We computed ground state and excited state properties of a particular system, using a wavefunction as the probability distribution. This was easily parallelized.

e.g http://phya.snu.ac.kr/~gsjeon/Primary/Abstract/abs26.html

While I can't say that one guess communicated with others, I can say that whenever I moved one particle, the others knew about it immediately because the wavefunction described a strongly correlated system. Communication between guesses sounds really interesting though. I've been out of the game for a while, so I'll have to look that up.

stellographer · on June 1, 2015

Check out this guy's work for some really cool examples: http://en.wikipedia.org/wiki/Xin-She_Yang

tel · on June 1, 2015

If your goal is optimization then MH is a bad choice. This is fine since MH is simply not an optimization algorithm! It's design to make accurate samples from a posterior distribution which you can measure but not find a functional form for.

It's a vastly more challenging problem than mere optimization.

enupten · on June 1, 2015

It is a multivariate (co-ordinate wise) version of MH. You can paralleize it because the Bayes net allows a decomposition of the p.d.

I feel like while your comment on Global-optimization algorithms may indeed be true, I don't quite yet believe that the hacks they involve are quite that general yet.

MH wasn't designed for Global optimization, and there is only one "particle". I guess this what you meant by "parallel" ?

stellographer · on June 1, 2015

Yeah, that's what I meant, the one particle bit.

Frequently you'll see things along the lines of "we ran 10,000 MCMC iterations to find the solution" and my first thought is "that must be a lot of wasted cycles."

I think I see what you mean now about the parallelization by decomposing the variables (splitting the problem into i separate problems which can be chained independently?)-- I didn't know that was a possibility. I'll have to look at that.

tel · on June 1, 2015

Well, you can run multiple simultaneous particle chains and sum their results. There's some wasted work since each will need to burn in, but modern algorithms can make that go quite quickly.

enupten · on June 1, 2015

Sure - but that's not what the OP is referring to. There is no "sync" step in multiple parallel chains, unlike particle swarm & genetic algorithms.

MCMC is also used for finding MAP solutions, an operation which is strictly less difficult than computing the partition function.