More

gratilup · on May 27, 2019

They are full 256 bit wide now, that's how it can claim a 2x float perf. improvement. It also doesn't have an "AVX" offset.

gratilup · on May 27, 2019

Now imagine a Threadripper with the Zen2 cores, higher IPC and frequency would be certainly welcome. Have the 32 core 2990WX and it's an incredible CPU for compiling large C++ programs, running big test suites and never having to worry about running too many tasks at the same time.

empyrical · on May 27, 2019

Conjecture on my part, but I wouldn't be surprised if we also see a 64 core Threadripper - there's going to be a 64 core Epyc:

https://wccftech.com/amd-7nm-epyc-64-core-32-core-cpus-specs...

Although they may save it for a "Zen2+" or something similar, like they did with 32 core Threadripper

bitL · on May 27, 2019

I am specifically waiting for 64c Threadripper. Would be also great if 32GB UDIMM ECC became available by that time to bump up RAM from 128GB to 256GB. That computer could then last a decade.

chaosbutters314 · on May 27, 2019

I'm looking forward to dual socket 64 core epyc for at work (128 cores)

danbolt · on May 27, 2019

A lot of AAA game studios with larger C++ codebases use Incredibuild a lot. I can imagine having something with this level of parallelism would be incredibly useful.

gratilup · on May 27, 2019

Don't even need Incredibuild, even MSBuild or just plain /MP option in VC++ can take advantage of it. A build from VS of the Unreal Engine 4 client takes around 2 min, for example.

gambiting · on May 27, 2019

I mean, yes, but even that is not enough once the project is big enough. I work on a huge AAA game in C++ and on my 8-core 16-threaded Xeon the whole thing compiles in 40 minutes. Incredibuild is a must to keep the compilation times even remotely acceptable.

danbolt · on May 27, 2019

Agreed, utilizing multiple processes helps a lot, although a larger number of cores across the entire network helps a lot with compiling the mass of translation units. Most workplaces I’ve done C++ at have had extra IncrediBuild agents with Intel Xeons to help with that

ahartmetz · on May 27, 2019

Icecream is the distributed compiler to use on Linux. Add ccache as needed.

loeg · on May 27, 2019

Hell, I have the older 16 core 1950X and it's amazing for compiling large codebases. I'd heartily recommend these things, performance for dollar is fantastic.

pimeys · on May 27, 2019

Having the 24-core 2970x and can comfirm, it is very nice for Rust development.

jaimeyap · on May 27, 2019

Nice. What build times are you seeing for clean builds of the rust toolchain itself? Curious to benchmark against my 2700x. I'd imagine near linear scaling with the core count.

I think the 3900x might be a happy middle ground. I'm guessing we would probably see (with the increased IPC, core count, and core clock) like 70-80% increases over a 2700x in these kinds of multithreaded workloads. So probably slightly more than half way to a 2970x or 2990wx?

loeg · on May 27, 2019

3900x looks fantastic on paper. In general, if the Ryzen stuff is sufficient for your needs, it's a better value. You pay a big premium for the Threadripper boards (and big case and big cooling solution). So in that sense, the 3900x is definitely in a sweet spot at the top of the Ryzen range.

Tradeoffs: threadripper boards officially support ECC; Ryzen boards are hit or miss. TR boards tend to be priced around $300 whereas you can get a Ryzen board for $100ish. TR had (prior generations) twice the DRAM channels and way more PCIe lanes than Ryzen, so if you're doing GPU-intense work or something else with use for lots of PCIe, that's a plus. Not to mention, additional core count over Ryzen, although with greater inter-die latency. Not sure what that will look like with TR3.

Is 3900X worth $500 at list over $400 3800X at list? Actually, yeah, it looks at least 25% better to me (esp. the doubled L3) if you can use the cores. The 3800X is overpriced; they probably are learning from the 1700<->1800 dynamic in gen1. Is it worth it over the 3700X at $330? Maybe not.

For me the question is really, how long will Ryzen 3000 be on the market before those better IPC/clocks/core densities show up in TR3? PCIe 4.0 support is huge; AMD wasn't anemic on PCIe channels on Zen and Zen+, and PCIe 4.0 doubles bandwidth from 3.0. Hopefully those IPC gains do not come attached to Spectre/Meltdown-like vulnerabilities. I'm excited for Zen 3 TR! That might be worth an upgrade from the 1950X. Meanwhile, it doesn't seem like Intel will get to PCIe 4 until 2020 (although that's reasonably soon).

jaimeyap · on May 27, 2019

3800X is "gamer priced" :).

I think the 3900x is in a great position to provide the best of both gaming and productivity. Extremely aggressively priced at $500 for the horsepower it seems to give you.

I suspect there is going to be a 16 core 3950x later in the year. Maybe with slightly lower single core frequencies. But maybe 20-25% greater multicore performance.

I bet they are delaying that to keep something up their sleeves when Intel responds. And to not totally cannibalize TR prior to releasing TR3.

tracker1 · on May 28, 2019

The 570 boards are going to be around $100-200 more expensive though. The PCB is a bit different and the specifications are more tight for PCIe 4. I think the cheapest board you'll see soon will be above $150 at the very low end all the way up to $600 or so. Many of the prior two gens will have issues running the newer CPUs and the board vendors are recommending against Zen2 on chipset boards prior to 570

80x86 · on May 27, 2019

Yeah - I wish I didn’t have to go threadripper to get ECC. I don’t need that much power.

gratilup · on Dec 15, 2016

As a compiler engineer, those are some of the least impressive things done by modern compilers. Why? Because there isn't anything smart behind them, it's pure pattern matching.

acid__ · on Dec 15, 2016

Now I'm curious, what are the most impressive things modern compilers do?

wolf550e · on Dec 15, 2016

probably this: http://polyhedral.info/

gratilup · on Nov 1, 2016

You basically just said that the Windows kernel code is beautiful, because they are using exactly the same coding style...

breakingcups · on Nov 3, 2016

To be fair, the Windows kernel code is beautiful in a lot of ways.

gratilup · on July 18, 2016

I'm the main developer of the new optimizer. It's a bit too much to say it replaces the old one, it's more of an addition. I was aware of this issue with initializing local arrays, it was on the TODO list - hopefully for the next VS.

brucedawson · on July 18, 2016

I look forward to a fix, especially if the (I think) larger issue of large aggregates not being initialized efficiently is addressed. It feels like it is another instance of the same issue.

I hope the Python script is helpful for finding some of the odd variants that currently exist.

ryuuchin · on July 18, 2016

Why isn't the ssa optimizer mentioned in the Update 3 release notes? Is it enabled by default in Update 3?

voltagex_ · on July 18, 2016

Are you allowed to respond directly to the author in this case?

apardoe-MSFT · on July 20, 2016

There's not much preventing any engineer at Microsoft from talking to developers. No lawyers need to be involved :-)

gratilup · on May 6, 2016

Visual C++ does not use the GCC naming for flags. The flag will likely be renamed to something shorter in the final release in Update 3.

gratilup · on May 6, 2016

That's why that patterns is not applied in the new optimizer, unless the Bit Estimator proves x+1 does not overflow. After it was implemented to behave like what other compilers do, I analyzed several applications and libraries and it would have introduced silent security problems - I explained in the blog post.

gratilup · on May 6, 2016

The optimizer will be ported to .NET Native soon, it should help code quality quite a lot. The JIT is a completely different matter though: it's a different project, done by a different team - copy-paste of the source code is definitely not going to work. JIT compilers also have different priorities, unlikely they would port all parts of a large optimizer.

gratilup · on May 6, 2016

There are only a few places where the undefined behavior is exploited right now. For those cases, if there is overflow in the original expression the program is pretty much screwed anyway; they are also done by LLVM/GCC. The case that mostly concerns people when they hear about undefined behavior (a+C > a -> true) was avoided, it would silently break too many applications.

TwoBit · on May 6, 2016

Since when does an overflow mean the program is screwed? Perhaps I misunderstand your meaning, but notvall integer overflows, and in fact probably only a small percentage of integer overflows, mean that a program is going to die or have some similarly bad behavior.

gratilup · on May 6, 2016

What I meant to say is that if you have an overflow in those cases, the results of the expression is definitely not what you wanted - this is going to propagate and "damage" other expressions. Applying the optimization in that case might produce a different result. An example is this new transformation from the blog post: (a * C1) / C2 -> a * (C1/C2), where a * C1 might overflow. For 8 bit numbers with a = 106 and C1 = C2 = 17 we get

initial: (106 * 17) / 17 = 10 (overflow) / 17 = 0

optimized: 106 * (17 / 17) = 106 * 1 = 106

So in this case the optimized version gives the expected result - it's still different than the initial expression, so it falls under the "undefined overflow" optimizations category.

gratilup · on May 6, 2016

I tried to have a simple example, maybe it was too simple. You are right, an OR with a constant is enough to know it is not zero. For this case the Bit Estimator also knows b = [3, 4083], writing something that takes advantage of this info would have been more interesting.