"empower all people," "help they deserve," "status quo is unacceptable," "they have had decades to find a solution to help the masses. They have failed," "facilitate this _movement_," "increase the rate of innovation across the world"...
Who do you think you are? MLK? You're a fancy-pants version of "Goodwill" that literally caters to just the 1%, get over it.
I do believe everything that is written there. If you want to take it as inflated or egotistical, go ahead -- that's your right.
If you understand the mission and what we're trying to do here though, we're really aiming to help everyone. Starting with software is simply a foothold that works when you're an early stage startup. It lets us focus the business and run the model against an industry that moves so ridiculously fast, we are forced to keep up. As we can build tech to scale coaching, we'd like to achieve price points low enough to help everyone.
That's the plan at least. Not over it! Will keep dreaming!
About 90% of positions of the scrollbar are unusable. You have to scroll like crazy to find that specific position that actually displays the page correctly. Horrible.
The page introduces another parallel scrollbar (that has the same function as the built-in scroll bar) and it works really slowly and breaks the built-in UI. At web designers: you have a built-in UI, everyone it used to it, it works, it is tested, use it!
Seems like genetic algorithms, particle swarms etc. would be more attractive choices since they solve the same problem and
are inherently parallelizable while Metropolis Hastings is almost 100 years old and designed for a 4 function calculator.
Although I guess some people have meta-parallelized it... but still seems like a patch job compared to modern likelihood navigation algos.
The point of Metropolis-Hastings is to sample from a distribution when you do not know the partition function. It is the most important building blocks in a set of algorithms broadly known as Markov Chain Monte Carlo. These algorithms are particularly useful when performing Bayesian statistics.
Genetic algorithms will not give you samples from a distribution, they only perform optimization. Particle swarms also focus on optimization, and on top of that, they do not seem to have either theoretical justification or empirical success.
MH is embarrassingly parallel since you run multiple chains at the same time. Again, the point isn't optimization (that would be simulated annealing) but sampling.
Being 100 years old is also largely irrelevant. People will publish new algorithms to get publications all the time. That do not mean they necessarily outperform the old ones. Gradient descent is the basic algorithm used in training all of these cool new deep learning algorithms, and it's much older than MH.
Yes, there are more recent improvements to MH, the two biggest ones being Hamiltonian Monte Carlo (which uses gradient information) and Parallel Tempering (which is somewhat similar to homotopy optimization), but that's hardly a reason to dismiss the importance of this algorithm.
"The point of Metropolis-Hastings is to sample from a distribution when you do not know the partition function."
That's one point, yes. The other is optimization. In which case I prefer the others I mentioned.
"they do not seem to have either theoretical justification or empirical success."
That's just patently false. They _do_ have empirical success. And besides, a lot of these variants have MH implemented inside of them to some extent o.o
"MH is embarrassingly parallel since you run multiple chains at the same time."
I can make six single grilled cheese sandwiches in 2 minutes, but it takes me 8 minutes to make a 6-decker grilled cheese. Is that a parallel process?
"Being 100 years old is also largely irrelevant."
In my opinion it is relevant since computation can now be done in completely new ways than 100 years ago.
1) Can you point out where particle swarm optimization has been successfully applied? Papers specifically about particle swarm optimization carry very little weight as it is very easy to design toy problems where any optimization technique is going to perform well, I'm looking for actual, practical use.
2) When you are sampling a distribution, you're not trying to make a 6-decker grilled cheese, you're trying to make many grilled cheese sandwiches.
3) In completely new ways? Not really. Algorithmic complexity which dominates run time independent of the computing medium. An algorithm designed to be efficiently run by a group of human "computers" with calculators is probably very similar to the same algorithm designed to be run by a CPU. If anything, the CPU optimized algorithm are likely to benefit from more sequential processing and less parallelism.
2) You are trying to find the global maximum. How do you not understand the value of communication when searching for a maxima in the likelihood distribution? You're just being intentionally obtuse.
3) Yes, really.
A story:
You have a landscape with mountains and hills and you have one person trying to find the tallest mountain. That person is blind, they can't see shit. That person is also mute and deaf. Their only sense is a vibrating altimeter. They get drunk, and climb mountains for 100000 days, trying to find the tallest mountain.
You are advocating the idea that you should send 1000 of these blind deaf mutes out there one at a time (running these in parallel is just faster serial) and then they should vote on which mountain is the tallest at the end.
I'm saying you should send a bunch of not-deaf-mutes out there (ie implement mutation, breeding, cross-contamination, gravitation, whatever) so they can tell each other where the stupid mountains are (this requires _parallel_) and they don't waste their whole time stumbling around (burning in).
HN does not allow downvoting replies to one's comment, so someone else must be downvoting you. Your tone is aggressive and you display a poor grasp of the topic which is probably why you're being downvoted.
1) This is an abstract of a PhD thesis which makes no mention of particle swarms, I couldn't find the full text. This is your evidence?
2) No this isn't about looking for the global maximum. Several people have explained to you that this is a sampling algorithm, but you still fail to show understanding of the difference.
3) Your story doesn't demonstrate your point and you do not understand the argument I presented to you. The types of algorithms suited for an army of people with calculators 100 years ago isn't fundamentally different from the type of algorithm suited for computers today, and if anything, it's likely to be more sequential, not less.
One of the parallelized versions is Gibbs sampling that is used for sampling from Bayesian networks. In this case you don't even need a proposal distribution; neither would you need the test from MH.
I think Graphlab (https://dato.com/) comes implemented with something of this kind.
The trouble with Particle swarms/Genetic algorithms is that they aren't guaranteed to sample from the underlying p.d. It is not yet apparent whether you can find the mode of a distribution faster by choosing a Markov chain whose stationary distribution is different from the underlying one.
Are you sure that Gibbs sampling isn't just the multivariate version of MH?
What I'm saying is that convergence speed for MH is limited by the fact that guesses cannot communicate with each other... which doesn't matter when you have a pencil and a 4 function calculator like when it was designed.
A genetic algorithm or a particle swarm algorithm is capable of much swifter convergence because the guesses _can_ communicate and influence the direction of the drunken walk.
When I was in grad school, we used MH to compute 400-dimensional integrals. We computed ground state and excited state properties of a particular system, using a wavefunction as the probability distribution. This was easily parallelized.
While I can't say that one guess communicated with others, I can say that whenever I moved one particle, the others knew about it immediately because the wavefunction described a strongly correlated system. Communication between guesses sounds really interesting though. I've been out of the game for a while, so I'll have to look that up.
If your goal is optimization then MH is a bad choice. This is fine since MH is simply not an optimization algorithm! It's design to make accurate samples from a posterior distribution which you can measure but not find a functional form for.
It's a vastly more challenging problem than mere optimization.
It is a multivariate (co-ordinate wise) version of MH. You can paralleize it because the Bayes net allows a decomposition of the p.d.
I feel like while your comment on Global-optimization algorithms may indeed be true, I don't quite yet believe that the hacks they involve are quite that general yet.
MH wasn't designed for Global optimization, and there is only one "particle". I guess this what you meant by "parallel" ?
Frequently you'll see things along the lines of "we ran 10,000 MCMC iterations to find the solution" and my first thought is "that must be a lot of wasted cycles."
I think I see what you mean now about the parallelization by decomposing the variables (splitting the problem into i separate problems which can be chained independently?)-- I didn't know that was a possibility. I'll have to look at that.
Well, you can run multiple simultaneous particle chains and sum their results. There's some wasted work since each will need to burn in, but modern algorithms can make that go quite quickly.
This article seems to adequately describe the definition and obvious shortcomings of the word _thesis_; which is what a _masters_ student should be doing. A _doctoral_ student should be doing a _dissertation_.
I'm not sure why so many people want a doctoral degree, as opposed to a masters. It doesn't offer significant financial return on the investment over the masters and it's only really required for entrance to the ivory tower... and once you're there you're going to be dealing with way more B.S. than the horrors of writing an introduction...
The entire point of the dissertation is to formulate a complete picture of where things belong and how they interact. It should start from nothing and progress to your contribution. _It should outline pitfalls, mistakes, musings, ideas, future works, etc._ all things which are included in paper publications only minimally (and often requested removed by referees for being off topic!).
Yes, it takes 6 months of headbashing LaTeX editing and it comes out with stupid errors on the front page and no one will ever read it besides yourself and your advisors; but that's honestly one of the best lessons you can learn about life in the ivory tower... You _will_ waste months of your life going nowhere, you _will_ have headbashingly mundane paperwork to do more than you like to think, you _will_ screw it up (and need to understand that every paper has some dirty little secret), you _will_ babble on about things no one cares about... That's what being an academic is!
If you just care about progress for publications and get a pained feeling when your time appears (appears! every fuck up is a valuable lesson) to have been wasted, then you are not an academic and, yes, the experience will be pointless for you.
The truly disgusting thing is these advisors who are streamlining and mechanizing the process. Go ahead and check out which schools pump out the most PhDs per faculty and per time invested (China and India? I'm fairly certain?). Which journals will accept the most articles per time invested (again the same). Are these places onto something no one else is onto? They are capable of producing far more quickly and efficiently than the "archaic" systems...
The only example of industrial design and mechanical engineering being fused given in the article is the heat vent one where you took the vents off the top of the sweat-lodge shaped computer and put them on the bottom because it looked "like a large salt shaker"...
But doesn't heat rise? And now the vents are in the thermodynamically opposite place they should be, and simultaneously sheltered from crosswind by the gumdrop shape? That doesn't sound like much of a field-mesh rather than an intentional crippling of the engineering portion...
What I was really hoping for was a layman's translation of the maths on wikipedia [ie. how to implement]. But this is a good jumping off point for figuring out which black box to use.
I'm just saying it takes about 30 seconds to explain the Metropolis-Hastings algorithm in plain english, and the wikipedia article is almost intentionally esoteric on the matter:
This solution is like trying to stir a pot of chili with an absinthe spoon.
I'm not sure why there's this holy grail of "the unified master version of" whatever.
Let me give an example. Say I write a paper on the shape of the non-dark-matter (stellar) density in the milky way by looking at y-type stars and I get an answer x. Now Bob comes along and looks at y2-type stars and gets answer x2. People have the idea that you just go to the first paper, add a footnote to y and x showing alternate values for y2 and x2...
But what that doesn't take into account is the fact that I used telescope a (a 10 meter hawaiian behemoth) and pointed it in one beam of the sky for 8 hours to get an ultra-deep pencil; but bob used telescope a2 (a modest 1.8 meter in la palma) that took an all sky survey and only goes very shallow. Now we add this in a footnote.
Next, there's a critical difference in the stars we studied. My y-type stars take 8 Gyr just to form, but Bob was using y2-type stars which live anywhere from 100 Myr to 15 Gyr. So I'm looking at the old stars and he's looking at all the stars. Since we know that different age stars live in different parts of the galaxy (old in the halo, young in the disk and bulge), our results are starting to look not as comparable as we thought... but it's minor, we'll add a footnote.
But then we realize that, since my old stars are giants and his all age stars are dwarfs, my stars are way brighter than his. Since my telescope is monstrous, and his is a small surveyor, my stars actually end up being observed to a distance 10 times that of his sample. In fact at these distances, the original model is a bad fit and we need to change from a power law to an Einasto profile. Bob can do that too, so our answers are easily comparable, but the Einasto law has more parameters so it would give a worse fit per parameter value than the power law he wanted to use originally... We add an appendix to the paper to explain this bit.
Then I notice that Bob's been using infrared data, and in the infrared there's a well known problem separating stars and galaxies in the data on telescope a2. In fact, Bob has to write a whole new section on some probabilistic tests and models he uses to adequately remove these galaxies from his y2 star sample. My telescope, observing in the optical at high resolution, has no such problem, so that section doesn't exist in my paper. Bob looks around awkwardly and stickies a hyperlink to his meta-analysis somewhere in my data section.
Then Jill comes along and says she doesn't agree at all with us; she got value x3 using the distribution of dwarf galaxies and if you believe in theory z, then _hers_ is the most accurate answer.
And we tell Jill to go write her own fucking paper.
"empower all people," "help they deserve," "status quo is unacceptable," "they have had decades to find a solution to help the masses. They have failed," "facilitate this _movement_," "increase the rate of innovation across the world"...
Who do you think you are? MLK? You're a fancy-pants version of "Goodwill" that literally caters to just the 1%, get over it.