0

I have financical trading software. It decodes fast/fix messages. I'm running same binaries on two different machines on very similar set of data. Software receive "messages" and decodes them. The general rule - longer message takes more time to decode:

i7-860, Windows 7:

Debug 18:23:48.8047325 count=51 decoding take microseconds = 300
Debug 18:23:49.7287854 count=53 decoding take microseconds = 349
Debug 18:23:49.7397860 count=110 decoding take microseconds = 516
Debug 18:23:49.7497866 count=92 decoding take microseconds = 512
Debug 18:23:49.7597872 count=49 decoding take microseconds = 267
Debug 18:23:49.7717878 count=194 decoding take microseconds = 823
Debug 18:23:49.7797883 count=49 decoding take microseconds = 296
Debug 18:23:49.7997894 count=50 decoding take microseconds = 299
Debug 18:23:50.7328428 count=101 decoding take microseconds = 583
Debug 18:23:50.7418433 count=42 decoding take microseconds = 281
Debug 18:23:50.7538440 count=151 decoding take microseconds = 764
Debug 18:23:50.7618445 count=57 decoding take microseconds = 279
Debug 18:23:50.7738452 count=122 decoding take microseconds = 712
Debug 18:23:50.8028468 count=52 decoding take microseconds = 281
Debug 18:23:51.7389004 count=137 decoding take microseconds = 696
Debug 18:23:51.7499010 count=100 decoding take microseconds = 485
Debug 18:23:51.7689021 count=185 decoding take microseconds = 872
Debug 18:23:51.8079043 count=49 decoding take microseconds = 315
Debug 18:23:52.7349573 count=90 decoding take microseconds = 532
Debug 18:23:52.7439578 count=53 decoding take microseconds = 277
Debug 18:23:52.7539584 count=134 decoding take microseconds = 623
Debug 18:23:52.7629589 count=47 decoding take microseconds = 294
Debug 18:23:52.7749596 count=198 decoding take microseconds = 868
Debug 18:23:52.8039613 count=52 decoding take microseconds = 291
Debug 18:23:53.7400148 count=132 decoding take microseconds = 666
Debug 18:23:53.7480153 count=81 decoding take microseconds = 430
Debug 18:23:53.7570158 count=49 decoding take microseconds = 301
Debug 18:23:53.7710166 count=156 decoding take microseconds = 752
Debug 18:23:53.7770169 count=45 decoding take microseconds = 270
Debug 18:23:54.7350717 count=108 decoding take microseconds = 578
Debug 18:23:54.7430722 count=52 decoding take microseconds = 286
Debug 18:23:54.7540728 count=138 decoding take microseconds = 567
Debug 18:23:54.7760741 count=160 decoding take microseconds = 753
Debug 18:23:54.8030756 count=53 decoding take microseconds = 292
Debug 18:23:55.7411293 count=110 decoding take microseconds = 629
Debug 18:23:55.7481297 count=48 decoding take microseconds = 294
Debug 18:23:55.7591303 count=84 decoding take microseconds = 386
Debug 18:23:55.7701309 count=90 decoding take microseconds = 484
Debug 18:23:55.7801315 count=120 decoding take microseconds = 527
Debug 18:23:55.8101332 count=53 decoding take microseconds = 290
Debug 18:23:56.7341861 count=121 decoding take microseconds = 667
Debug 18:23:56.7421865 count=53 decoding take microseconds = 293
Debug 18:23:56.7531872 count=127 decoding take microseconds = 586
Debug 18:23:56.7621877 count=58 decoding take microseconds = 306
Debug 18:23:56.7751884 count=138 decoding take microseconds = 649
Debug 18:23:56.8021900 count=53 decoding take microseconds = 288
Debug 18:23:57.7392436 count=139 decoding take microseconds = 699
Debug 18:23:57.7502442 count=121 decoding take microseconds = 548
Debug 18:23:57.7582446 count=61 decoding take microseconds = 301
Debug 18:23:57.7692453 count=98 decoding take microseconds = 500
Debug 18:23:57.7792458 count=94 decoding take microseconds = 460
Debug 18:23:57.8092476 count=41 decoding take microseconds = 274

Xeon E3-1220, Windows Server 2008 R2 foundation:

Debug 18:28:57.5087967 count=117 decoding take microseconds = 255
Debug 18:28:57.5087967 count=85 decoding take microseconds = 187
Debug 18:28:57.5087967 count=55 decoding take microseconds = 155
Debug 18:28:57.5243967 count=86 decoding take microseconds = 189
Debug 18:28:57.5243967 count=53 decoding take microseconds = 139
Debug 18:28:57.5243967 count=52 decoding take microseconds = 153
Debug 18:28:57.5243967 count=55 decoding take microseconds = 146
Debug 18:28:57.5243967 count=103 decoding take microseconds = 239
Debug 18:28:57.5243967 count=83 decoding take microseconds = 182
Debug 18:28:57.5243967 count=85 decoding take microseconds = 180
Debug 18:28:57.5243967 count=80 decoding take microseconds = 202
Debug 18:28:57.5243967 count=58 decoding take microseconds = 135
Debug 18:28:57.5243967 count=55 decoding take microseconds = 140
Debug 18:28:57.5243967 count=81 decoding take microseconds = 183
Debug 18:28:57.5243967 count=74 decoding take microseconds = 172
Debug 18:28:57.5243967 count=80 decoding take microseconds = 174
Debug 18:28:57.5243967 count=88 decoding take microseconds = 175
Debug 18:28:57.5243967 count=55 decoding take microseconds = 131
Debug 18:28:57.5243967 count=80 decoding take microseconds = 182
Debug 18:28:57.5243967 count=80 decoding take microseconds = 183
Debug 18:28:57.5243967 count=101 decoding take microseconds = 231
Debug 18:28:57.5243967 count=58 decoding take microseconds = 134
Debug 18:28:57.5243967 count=57 decoding take microseconds = 126
Debug 18:28:57.5243967 count=57 decoding take microseconds = 134
Debug 18:28:57.5399967 count=115 decoding take microseconds = 234
Debug 18:28:57.5399967 count=106 decoding take microseconds = 225
Debug 18:28:57.5399967 count=108 decoding take microseconds = 241
Debug 18:28:57.5399967 count=84 decoding take microseconds = 177
Debug 18:28:57.5399967 count=54 decoding take microseconds = 141
Debug 18:28:57.5399967 count=84 decoding take microseconds = 186
Debug 18:28:57.5399967 count=82 decoding take microseconds = 184
Debug 18:28:57.5399967 count=82 decoding take microseconds = 179
Debug 18:28:57.5399967 count=56 decoding take microseconds = 133
Debug 18:28:57.5399967 count=57 decoding take microseconds = 127
Debug 18:28:57.5399967 count=82 decoding take microseconds = 185
Debug 18:28:57.5399967 count=76 decoding take microseconds = 178
Debug 18:28:57.5399967 count=82 decoding take microseconds = 184
Debug 18:28:57.5399967 count=54 decoding take microseconds = 139
Debug 18:28:57.5399967 count=54 decoding take microseconds = 137
Debug 18:28:57.5399967 count=81 decoding take microseconds = 184
Debug 18:28:57.5399967 count=136 decoding take microseconds = 275
Debug 18:28:57.5399967 count=55 decoding take microseconds = 138
Debug 18:28:57.5555968 count=52 decoding take microseconds = 140
Debug 18:28:57.5555968 count=53 decoding take microseconds = 136
Debug 18:28:57.5555968 count=54 decoding take microseconds = 139
Debug 18:28:57.5555968 count=55 decoding take microseconds = 138
Debug 18:28:57.5555968 count=57 decoding take microseconds = 134
Debug 18:28:57.5555968 count=53 decoding take microseconds = 136
Debug 18:28:57.5555968 count=80 decoding take microseconds = 174
Debug 18:28:57.5555968 count=74 decoding take microseconds = 175
Debug 18:28:57.5555968 count=57 decoding take microseconds = 133
Debug 18:28:57.5555968 count=57 decoding take microseconds = 149
Debug 18:28:57.5555968 count=100 decoding take microseconds = 262
Debug 18:28:57.5555968 count=56 decoding take microseconds = 156
Debug 18:28:57.5555968 count=55 decoding take microseconds = 165

From this test I see that E3-1220 two times faster than i7-860.

Is that possible? Because in the processors ratings these processors are about the same.

Is it possible that this is because of cache or something? And if so which processor I better to buy to decode messages two more times faster?

I've compared CPUs using Pi calculation tool, results:

Pi 16k
Xeon 00.234 sec
i7-860 00.171s

Pi 512k digits
Xeon 5.31 sec
i7-860(no HT) 5.987 sec.
i7-860(HT)    5.982 sec

Pi 4M digits
Xeon          0.56 min
i7-860(no HT) 1.11 min
i7-860(HT)    1.05 min

So Xeon is actually a little bit faster but definitely not two times faster

Turning off HT on i7-860 doesn't change the picture.

i7-860, Windows 7, no HT:

Debug 10:09:30.7436690 count=58 decoding take microseconds = 351
Debug 10:09:34.9269083 count=47 decoding take microseconds = 347
Debug 10:09:34.9959122 count=50 decoding take microseconds = 309
Debug 10:09:35.0359145 count=45 decoding take microseconds = 297
Debug 10:09:35.1469209 count=57 decoding take microseconds = 344
Debug 10:09:35.1979238 count=54 decoding take microseconds = 460
Debug 10:09:35.2179249 count=61 decoding take microseconds = 372
Debug 10:09:35.3009297 count=51 decoding take microseconds = 275
Debug 10:09:35.3479324 count=45 decoding take microseconds = 305
Debug 10:09:35.3779341 count=58 decoding take microseconds = 311
Debug 10:09:35.3879346 count=50 decoding take microseconds = 286
Debug 10:09:35.4379375 count=48 decoding take microseconds = 290
Debug 10:09:35.4789398 count=48 decoding take microseconds = 277
Debug 10:09:35.5089416 count=49 decoding take microseconds = 286
Debug 10:09:35.5589444 count=74 decoding take microseconds = 382
Debug 10:09:35.5679449 count=47 decoding take microseconds = 298
Debug 10:09:35.7389547 count=50 decoding take microseconds = 304

Processors comparision: http://ark.intel.com/compare/52269,41316

Xeon has 50% more Core Ratio, 100% more System bus, AVX, ECC Memory, Turbo Boost 2.0, AES, Intel® Demand Based Switching, Thermal Monitoring Technologies, Intel® Fast Memory Access, Intel® Flex Memory Access

i7-860 has HT and Enhanced Intel SpeedStep® Technology

Probably because of a lot of extra technologies Xeon is 2 times faster....

7
  • One thing I notice is that the E3-1220's clock speed is slightly higher. It also supports the AES instructions, although I'm not sure if this has anything to do with what you are doing.
    – gparent
    Commented Mar 28, 2012 at 14:36
  • 1
    Are CPUs and OS the only thing that is different on the machines? What about for example RAM?
    – vartec
    Commented Mar 28, 2012 at 15:40
  • @Vartec I think other things are pretty the same, i7-860 runs 12 GB DDR3 1333 9-9-9, E3-1220 runs 4 GB DDR3 1333 9-9-9
    – javapowered
    Commented Mar 28, 2012 at 15:42
  • 1
    how about results when you deactivate hyper-threading on i7?
    – vartec
    Commented Mar 28, 2012 at 15:45
  • @vartec I've updated description.
    – javapowered
    Commented Mar 29, 2012 at 6:21

3 Answers 3

2

Two things you're completely missing out on.

Number 1 is a direct answer to your question. The Nehalem generation, aka i7 1.0 was a huge step up from Core 2 Duo, but after Sandy Bridge is where Intel started to struggle to find performance improvements and plateaued. Your Xeon is around 20% faster clock for clock in it's i7 version from the same gen, and in somewhat L2 optimized workloads like this you can peg that closer to 40% because it has a bigger L3 than it's i7 version. If this is the main process running on that system at the time (as in it has the lions share of resources) it gets even better due to less other things needing some on-CPU memory.

Number 2 and probably the biggest factor here: Cache size.

Intel is well aware that only operating workloads smaller than L2 cache are actually performing at the true speed of the processor, and everything else is performing at the speed it takes the CPU to swap bits of it in and out of L2 cache. Kinda positive thing in a way because it disciplines programmers so they see far more benefits to creating smaller, more efficient code, but also I suspect a deliberate crutch to consumer class; as the other main method of differentiating number crunching performance from Xeon to i7 (other because core count and AVX are the main draws). Every other performance facet that Xeons have over i7s are related to bandwidth... dependant on other companies improving their tech (to use this new bandwidth) for Xeon to justify it's value in general workloads (non-AVX/2).

This is why your Xeon, (and most Xeons) has a much bigger, 1MB L2 cache and your i7 has a 256KB L2 (likely slower speed too due to age) and I would assume - because this process looks like it is an itemised, easily threaded set of tasks likely written as a loop function that fits into a small amount of memory - it will be very prone to L2 cache speed benefits and would swap less in the Xeon, vastly improving performance. Of course for marketing reasons the L2 cache statistic was removed from Intel's Ark and replaced by the 'SmartCache' stat which is basically your L3 cache, aka the cache used to trade data between cores, and much slower and less relevant to performance.

3

No two processors from different generations/manufacturers perform the same, regardless of their number of cores and clock speed. The E3-1220 is a processor from 2011 whilst the i7-860 is a processor from 2009, with an older architecture.

Even in synthetic benchmarks, the E3-1220(link) outperforms the i7-860(link)

If you're willing to spend a few extra $$ to get the E3-1230(link) you will also get hyperthreading which will offer a considerable performance increase over the E3-1220 which has no HT.

2
  • Actually I consider to buy new server on Xeon E5, but I don't know if it will be faster, becaouse most of the servers are 2-processors and has much lower clock ratio about 2 GHz
    – javapowered
    Commented Mar 29, 2012 at 9:00
  • 1
    You should bother yourself with doing research by looking up reviews of the processors you plan to buy, instead of just looking at clock speeds, as these are rarely relevant between different processors.
    – gekkz
    Commented Mar 29, 2012 at 9:02
0

Have you run any other performance benchmarking utilities? It would be interesting to get a overview of performance for both systems. Maybe there is a difference with some other hardware or software.

Are both computers running a 64-bit OS?

5
  • yes both computers are running 64-bit OS, no I didn't run other perfomance tests...
    – javapowered
    Commented Mar 28, 2012 at 18:08
  • 1
    I expect the difference here is more to do with the disks than the cpu. It may also have something to do with turboboost. Commented Mar 28, 2012 at 18:31
  • @JoelCoehoorn I assume that "packet decoding" doesn't use disks. Commented Mar 29, 2012 at 13:24
  • @javapowered something in your question made me think you were doing your tests based on pre-recorded packets stored on disk. Commented Mar 29, 2012 at 13:40
  • no, i'm using live data from network Commented Mar 29, 2012 at 17:02

You must log in to answer this question.

Not the answer you're looking for? Browse other questions tagged .