AMD and Nvidia both used VLIW in the past and both moved away because they couldn't get it to run efficiently. If embarrassingly parallel problems can't execute efficiently on VLIW architectures, I somehow doubt that CPUs will either.
The final versions of Itanic started adopting all the branch predictors and trappings from more traditional chips.
The problem is that loops theoretically cannot be completely predicted at compile time (the halting problem). Modern OoO CPUs are basically hardware JITs that change execution paths and patterns on the fly based on previous behavior. This (at least at present) seems to get much better data resulting in much better real-world performance compared to what the compiler sees.
If you look at uops executed per port benchmarks you can see that CPUs are far from all seeing eyes.