Memlat is a tiny benchmark program to measure cache and memory access latencies. I wrote it based on the idea from the article "What Every Programmer Should Know About Memory" by Ulrich Drepper. If you find any bug, or have comment, you can email me at minlee (at) cc.gatech.edu
[root@piquet memlat]# ./memlat Usage: ./memlat size(KB) duration(second) random(0|1) stride size is 64, random(0|1) is do_shuffle switch, so give 0 for sequential, 1 for random. each run is 1sec, and the final report shows min,max,average for both cycles,performance(unit:Million elements traversed). For accuracy, make sure stride size (current 64) == your cache line size. Also set affinity to one cpu without running any other process. Note that 2 or 3 cycles are typically measurable minimum due to size of core loop, so for L1 cache you'll see them even if it has actually less latency. [root@piquet memlat]# [root@piquet memlat]# [root@piquet memlat]# [root@piquet memlat]# ./memlat 512 10 1 64(STRIDE size) * 8192(# of stride) = 512 KB cycle: 2994414795, count:189324540, so, 15.816306 cycles/memref cycle: 2992305429, count:189193313, so, 15.816127 cycles/memref cycle: 2992371714, count:189202611, so, 15.815700 cycles/memref cycle: 2992375665, count:189144438, so, 15.820585 cycles/memref cycle: 2992366656, count:189190240, so, 15.816707 cycles/memref cycle: 2992360986, count:189201527, so, 15.815734 cycles/memref cycle: 2992376781, count:189206673, so, 15.815387 cycles/memref cycle: 2992372398, count:189206898, so, 15.815345 cycles/memref cycle: 2992380669, count:189208316, so, 15.815270 cycles/memref cycle: 2992382568, count:189207564, so, 15.815343 cycles/memref summary: cycle 15.815 15.821 15.816 perf 189.144 189.324 189.208 [root@piquet memlat]#
Give various working set size, get cycle numbers, and plot graph. I could get ones similar to above graph.
last updated : Jan 2012