1

I have strange situation. Long story short, laptop with Intel i5-8350u runs Python script twice faster in single core than AMD Ryzen 2700x.

Now a bit more details.
Firstly, specs of the machines.
AMD
Ryzen 2700x(stock)
16 GB of DDR4@2133(Dual channel)
Data on SATA HDD

INTEL
i5-8350u
16 GB of DDR4@2400(Dual channel)
Data on UBS3 HDD

Laptop runs Mac OS X Catalina 10.15.2 with latest Anaconda and Python 3.8.1. Desktop runs Ubuntu 18.04.3 with latest Anaconda and Python 3.8.1 as well. The sole detail that I have built numpy with openblas.

The script is generation of CornerPlot from a posterior files. There are 300 objects in total. I have serial and parallel versions of this code and the result are following :

Results

i5 - 21m22s in single and 6m25s in parallel  
ryzen - 40m44s in single and 3m34s in parallel  

Is that normal? Anything I can do to improve Ryzen performance?

OBS: I'm aware of dependency of Ryzen on memory, probably I will try to overclock and retest.

OBS2: I'm implying something similar. So there is maybe some software fix. - > AMD-Ryzen-3900X-vs-Intel-Xeon-2175W

6
  • Intel CPU's always had better single-core performance than AMD and AMD has a slight advantage on the multi-core performance. It's always been like this (there is a technical explanation but I'm not very aware of it). If you want to improve Ryzen's performance you either run the script on more cores/overclock it for a slight difference.
    – CaldeiraG
    Commented Jan 24, 2020 at 13:54
  • I suggest you try a benchmark program on both to verify actual memory and cpu speed. As well as HDD benchmarks. RAM is probably the issue though, as well as possible optimization.
    – Natsu Kage
    Commented Jan 24, 2020 at 14:01
  • @CaldeiraG Sure, Im aware of Intel edge in single core performance. But the case here is that Im comparing Desktop vs. Laptop CPUs and receiving huge difference... Commented Jan 24, 2020 at 14:11
  • Single core performance should be vaguely comparable: cpubenchmark.net/compare/… so I'm surprised at such a big difference. How large in GB is the set of files? What actually is the "USB3 HDD", is it an SSD? Could it be disk access times being slow?
    – Mokubai
    Commented Jan 24, 2020 at 15:09
  • @Mokubai Total size of input data is about 17GB. Each object is between 50 and 100 MB. USB3 HDD is HardDisk connected via USB3 external port. Commented Jan 24, 2020 at 16:21

1 Answer 1

1

Anaconda uses Intel MKL as default BLAS library, which runs slow on AMD processors (see Ryzen and Intel's Anti-competitive MKL).

You have a couple of choices. I'd really appreciate if you could try both them and report here the results (as a comment).

  1. just set the environment variable MKL_DEBUG_CPU_TYPE=5; see here for more information; this will "fool MKL into using an AVX2 optimization level on AMD CPU's";

  2. install the conda package nomkl to opt out of MKL and use OpenBLAS instead; see "Uninstalling MKL" on this page.

You should see a huge improvement with both solutions. Nonetheless, note that AMD CPUs before those with Zen2 architecture (3rd gen) are slower than Intel CPUs on executing AVX2 instructions for architectural reasons.

EDIT: sorry, I misread the command I had pasted in solution 2 (conda install -c anaconda nomkl). It doesn't create a new environment, as I wrote, it just installs nomkl. I removed the command add a link that explain everything you need to do to remove MKL in detail.

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .