Well, fair enough: presumably GHC is creating 5 kernel threads under the covers, whereas the C implementation creates 503 kernel threads. Therefore the C implementation incurs approximately 100x the context-switching overhead. Once again, apples to oranges.
the threads are scheduled amongst the available CPU cores.
Actually, the per-CPU idle stats suggest that the Haskell program ran entirely on a single CPU, so it wasn't actually utilizing all 4 cores anyway.
the threads are scheduled amongst the available CPU cores.
Actually, the per-CPU idle stats suggest that the Haskell program ran entirely on a single CPU, so it wasn't actually utilizing all 4 cores anyway.