Hacker Newsnew | past | comments | ask | show | jobs | submit | gratilup's commentslogin

They are full 256 bit wide now, that's how it can claim a 2x float perf. improvement. It also doesn't have an "AVX" offset.


Now imagine a Threadripper with the Zen2 cores, higher IPC and frequency would be certainly welcome. Have the 32 core 2990WX and it's an incredible CPU for compiling large C++ programs, running big test suites and never having to worry about running too many tasks at the same time.


Conjecture on my part, but I wouldn't be surprised if we also see a 64 core Threadripper - there's going to be a 64 core Epyc:

https://wccftech.com/amd-7nm-epyc-64-core-32-core-cpus-specs...

Although they may save it for a "Zen2+" or something similar, like they did with 32 core Threadripper


I am specifically waiting for 64c Threadripper. Would be also great if 32GB UDIMM ECC became available by that time to bump up RAM from 128GB to 256GB. That computer could then last a decade.


I'm looking forward to dual socket 64 core epyc for at work (128 cores)


A lot of AAA game studios with larger C++ codebases use Incredibuild a lot. I can imagine having something with this level of parallelism would be incredibly useful.


Don't even need Incredibuild, even MSBuild or just plain /MP option in VC++ can take advantage of it. A build from VS of the Unreal Engine 4 client takes around 2 min, for example.


I mean, yes, but even that is not enough once the project is big enough. I work on a huge AAA game in C++ and on my 8-core 16-threaded Xeon the whole thing compiles in 40 minutes. Incredibuild is a must to keep the compilation times even remotely acceptable.


Agreed, utilizing multiple processes helps a lot, although a larger number of cores across the entire network helps a lot with compiling the mass of translation units. Most workplaces I’ve done C++ at have had extra IncrediBuild agents with Intel Xeons to help with that


Icecream is the distributed compiler to use on Linux. Add ccache as needed.


Hell, I have the older 16 core 1950X and it's amazing for compiling large codebases. I'd heartily recommend these things, performance for dollar is fantastic.


Having the 24-core 2970x and can comfirm, it is very nice for Rust development.


Nice. What build times are you seeing for clean builds of the rust toolchain itself? Curious to benchmark against my 2700x. I'd imagine near linear scaling with the core count.

I think the 3900x might be a happy middle ground. I'm guessing we would probably see (with the increased IPC, core count, and core clock) like 70-80% increases over a 2700x in these kinds of multithreaded workloads. So probably slightly more than half way to a 2970x or 2990wx?


3900x looks fantastic on paper. In general, if the Ryzen stuff is sufficient for your needs, it's a better value. You pay a big premium for the Threadripper boards (and big case and big cooling solution). So in that sense, the 3900x is definitely in a sweet spot at the top of the Ryzen range.

Tradeoffs: threadripper boards officially support ECC; Ryzen boards are hit or miss. TR boards tend to be priced around $300 whereas you can get a Ryzen board for $100ish. TR had (prior generations) twice the DRAM channels and way more PCIe lanes than Ryzen, so if you're doing GPU-intense work or something else with use for lots of PCIe, that's a plus. Not to mention, additional core count over Ryzen, although with greater inter-die latency. Not sure what that will look like with TR3.

Is 3900X worth $500 at list over $400 3800X at list? Actually, yeah, it looks at least 25% better to me (esp. the doubled L3) if you can use the cores. The 3800X is overpriced; they probably are learning from the 1700<->1800 dynamic in gen1. Is it worth it over the 3700X at $330? Maybe not.

For me the question is really, how long will Ryzen 3000 be on the market before those better IPC/clocks/core densities show up in TR3? PCIe 4.0 support is huge; AMD wasn't anemic on PCIe channels on Zen and Zen+, and PCIe 4.0 doubles bandwidth from 3.0. Hopefully those IPC gains do not come attached to Spectre/Meltdown-like vulnerabilities. I'm excited for Zen 3 TR! That might be worth an upgrade from the 1950X. Meanwhile, it doesn't seem like Intel will get to PCIe 4 until 2020 (although that's reasonably soon).


3800X is "gamer priced" :).

I think the 3900x is in a great position to provide the best of both gaming and productivity. Extremely aggressively priced at $500 for the horsepower it seems to give you.

I suspect there is going to be a 16 core 3950x later in the year. Maybe with slightly lower single core frequencies. But maybe 20-25% greater multicore performance.

I bet they are delaying that to keep something up their sleeves when Intel responds. And to not totally cannibalize TR prior to releasing TR3.


The 570 boards are going to be around $100-200 more expensive though. The PCB is a bit different and the specifications are more tight for PCIe 4. I think the cheapest board you'll see soon will be above $150 at the very low end all the way up to $600 or so. Many of the prior two gens will have issues running the newer CPUs and the board vendors are recommending against Zen2 on chipset boards prior to 570


Yeah - I wish I didn’t have to go threadripper to get ECC. I don’t need that much power.


As a compiler engineer, those are some of the least impressive things done by modern compilers. Why? Because there isn't anything smart behind them, it's pure pattern matching.


Now I'm curious, what are the most impressive things modern compilers do?



You basically just said that the Windows kernel code is beautiful, because they are using exactly the same coding style...


To be fair, the Windows kernel code is beautiful in a lot of ways.


I'm the main developer of the new optimizer. It's a bit too much to say it replaces the old one, it's more of an addition. I was aware of this issue with initializing local arrays, it was on the TODO list - hopefully for the next VS.


I look forward to a fix, especially if the (I think) larger issue of large aggregates not being initialized efficiently is addressed. It feels like it is another instance of the same issue.

I hope the Python script is helpful for finding some of the odd variants that currently exist.


Why isn't the ssa optimizer mentioned in the Update 3 release notes? Is it enabled by default in Update 3?


Are you allowed to respond directly to the author in this case?


There's not much preventing any engineer at Microsoft from talking to developers. No lawyers need to be involved :-)


Visual C++ does not use the GCC naming for flags. The flag will likely be renamed to something shorter in the final release in Update 3.


That's why that patterns is not applied in the new optimizer, unless the Bit Estimator proves x+1 does not overflow. After it was implemented to behave like what other compilers do, I analyzed several applications and libraries and it would have introduced silent security problems - I explained in the blog post.


The optimizer will be ported to .NET Native soon, it should help code quality quite a lot. The JIT is a completely different matter though: it's a different project, done by a different team - copy-paste of the source code is definitely not going to work. JIT compilers also have different priorities, unlikely they would port all parts of a large optimizer.


There are only a few places where the undefined behavior is exploited right now. For those cases, if there is overflow in the original expression the program is pretty much screwed anyway; they are also done by LLVM/GCC. The case that mostly concerns people when they hear about undefined behavior (a+C > a -> true) was avoided, it would silently break too many applications.


Since when does an overflow mean the program is screwed? Perhaps I misunderstand your meaning, but notvall integer overflows, and in fact probably only a small percentage of integer overflows, mean that a program is going to die or have some similarly bad behavior.


What I meant to say is that if you have an overflow in those cases, the results of the expression is definitely not what you wanted - this is going to propagate and "damage" other expressions. Applying the optimization in that case might produce a different result. An example is this new transformation from the blog post: (a * C1) / C2 -> a * (C1/C2), where a * C1 might overflow. For 8 bit numbers with a = 106 and C1 = C2 = 17 we get

initial: (106 * 17) / 17 = 10 (overflow) / 17 = 0

optimized: 106 * (17 / 17) = 106 * 1 = 106

So in this case the optimized version gives the expected result - it's still different than the initial expression, so it falls under the "undefined overflow" optimizations category.


I tried to have a simple example, maybe it was too simple. You are right, an OR with a constant is enough to know it is not zero. For this case the Bit Estimator also knows b = [3, 4083], writing something that takes advantage of this info would have been more interesting.


Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: