Membandwidth timings

Here are some results of running my membandwidth tool on various computers and compiler versions. If you have interesting results let me know so I can include them in this table (please state processor type and speed, memory speed, compiler version and optimization-relevant flags).

SystemCompiler, optionsOS memcpy(1k)memset(1k)blocksum8(1k) memcpy(8M)memset(8M)blocksum8(8M) Comments
AMD Athlon XP2200+, PC2700 DDRGCC-2.95.2 (-O3)Linux 2.4 4147 MB/s5172 MB/s5947 MB/s 419 MB/s600 MB/s1335 MB/s
AMD Athlon XP2200+, PC2700 DDRIntel-6.0.1 (-O3 -xiMK -tpp6)Linux 2.4 4408 MB/s5669 MB/s7822 MB/s 423 MB/s601 MB/s1595 MB/s
AMD Athlon X2 4600+ EE, PC2-6400 DDR2GCC 4.1 (-O3)Linux 2.6 12068 MB/s21225 MB/s10367 MB/s 3016 MB/s6646 MB/s2589 MB/s
Intel P4 3.2 GHzIntel-8 (-O3 -QaxKN)WinXP 9540 MB/s12175 MB/s11785 MB/s 1865 MB/s4011 MB/s3829 MB/s
Intel P4 3.2 GHzMSVC.NET ReleaseWinXP 5411 MB/s6041 MB/s
8147 MB/s
9998 MB/s 1280 MB/s1374 MB/s2736 MB/s 1
Intel P4 3.2 GHzGCC 3.2 (-O3)WinXP/Cygwin 6576 MB/s
5000 MB/s
4852 MB/s
8258 MB/s
6881 MB/s
6670 MB/s
8820 MB/s
10088 MB/s
9760 MB/s
1623 MB/s
1247 MB/s
1101 MB/s
1779 MB/s
1348 MB/s
1502 MB/s
3511 MB/s
2644 MB/s
2524 MB/s
2
Intel P4 3.2 GHzMINGW 3.4.5 (-O3)WinXP 5044 MB/s7919 MB/s9892 MB/s 1233 MB/s1499 MB/s2479 MB/s 3
AMD Opteron 280, 2.3 GHzIntel-8 (-O3)WinXP 2195 MB/s2522 MB/s8052 MB/s 864 MB/s1319 MB/s2697 MB/s 4
AMD Opteron 280, 2.3 GHzIntel-8 (-O3 -QxW)WinXP 2195 MB/s2520 MB/s8881 MB/s 880 MB/s1464 MB/s2334 MB/s 4
AMD Opteron 280, 2.3 GHzMSVC.NET ReleaseWinXP 7864 MB/s13818 MB/s11118 MB/s 1204 MB/s1988 MB/s2622 MB/s 5
AMD Opteron 280, 2.3 GHzMINGW 3.4.5 (-O3)WinXP 7997 MB/s13270 MB/s8105 MB/s 1220 MB/s1721 MB/s2462 MB/s 5
Iyonix PC (Intel XScale 600MHz)GCC-3.5.4, UnixLib (-O3)RISC OS 103.4 MB/s101.4 MB/s741.7 MB/s 28.8 MB/s98.4 MB/s126.4 MB/s 6


1) Note the huge difference for memset() in cache which occurred consistently by renaming the binary and copying it elsewhere. The slow version was run directly from the Release directory, the fast one renamed and copied to another location. And no, debug info isn't the reason either...
2) These are really weird, there are huge fluctuations in this series. As far as I can tell from the task manager, nothing much was running while any of these measurements were in action, still the bandwidths differ enormously over those runs. Either something's broken in Cygwin's CRT or Windows... "frowns" on Cygwin.
3) There were some fluctuations in these measurements as well, but nowhere near as bad as in the Cygwin case.
4) The Intel-compiler performs very badly on this AMD processor; SSE-acceleration can speed up blocksum8 a little, but the results are still highly disappointing. Coincidence? I think not...
5) MSVC.NET and GCC perform a lot better on the AMD CPU than the Intel-compiler. Both compilers provide very similar results, the only difference worth noting is in blocksum8 where I suppose MSVC.NET manages to squeeze in some SSE-acceleration.
6) Yes, these are really decimal points you're seeing. I didn't bother with them for the other measurements because the program isn't that precise to begin with, but some of these values go down low enough to actually warrant a decimal point, sad but true.

Back to my software page.