/* A guest language on this blog. A welcome, please, for C */ /* The reason that I am off on this particular one at the moment is because I recently waited 3 hours for a clojure program to terminate, after about a day trying to get it to run at all. */ /* When it became apparent that it was not going to finish any time soon, I hacked together an answer in C, using what should have been a less efficient algorithm that I had come up with while waiting. */ /* That program took about 15 minutes to write and 65 seconds to run, and got the correct answer. */ /* That comparison is entirely unfair to both clojure and C in all sorts of ways, but if I am going to spend time getting clojure to run at Cish speeds, I need to know what I should be aiming for. */ /* This program is what I am using as a comparison for (reduce + (map + _ _ )) */ /* To make sure that clever compilers and runtimes aren't doing any sneaky deadcode elimination, it is actually doing some sort of computation. But it is mainly mapping and reducing. Lots.*/ #include<stdio.h> #define N 1000000 int a[N]; int b[N]; int main(void) { int i, count; long long sum=0; for (i=0; i< N; i++) { a[i]=i; } for(count=0; count<1000; count++){ for (i=0; i< N; i++) { b[i]+=a[i]; } for (i=0; i< N; i++) { sum+=b[i]; } } printf("sum=%lli\n", sum); } /* gcc std=gnu99 Ofast efficiencyandprogress.c o efficiencyandprogress && time ./efficiencyandprogress */ /* sum=250249749750000000 */ /* real 0m16.053s */ /* user 0m15.992s */ /* sys 0m0.032s */ /* So it looks as though adding one array to another and then adding up all the values in an array takes about 16ms in total.*/ /* That's 16ns per array entry which looks sort of OK on my little netbook, which boast an Intel Atom CPU N455 running at 1.66GHz with a 512kb*/ /* I'm hoping there's enough complexity here that the compiler actually has to run the program rather than taking short cuts*/ /* But just as a check, here's the code running with gcc in 'do exactly what I say so I can debug it' mode.*/ /* gcc std=gnu99 O0 efficiencyandprogress.c o efficiencyandprogress && time ./efficiencyandprogress */ /* sum=250249749750000000 */ /* real 0m27.850s */ /* user 0m27.692s */ /* sys 0m0.060s */ /* This produces a small constant factor speedup, as expected if the two versions are doing the same work. */
Blog Archive

▼
2013
(53)

▼
September
(12)
 Efficiency and Progress III : How Close Can We Get...
 Efficiency and Progress II : Behold, C
 Efficiency and Progress : My God, Clojure is Slow!...
 How on earth do you program without *recursion*?
 Algorithms II
 Putting it all together : Using UnionFind in Krus...
 Implementing a Data Structure : UnionFind version...
 Implementing a Data Structure: UnionFind
 Kruskal's Algorithm for Minimal Spanning Trees
 The Knapsack Problem : Another Classic Puzzle from...
 Optimal Scheduling : A Classic Puzzle from Compute...
 SVG Graphics in a Clojure Web Application

▼
September
(12)
Wednesday, September 25, 2013
Efficiency and Progress II : Behold, C
Subscribe to:
Post Comments (Atom)
Thx again for your most interesting blog posts !
ReplyDeleteYou might find it interesting to compile your C code with «gcc S speed.c O4 o speed.S march=native ftreevectorize ftreevectorizerverbose=2 verboseasm» and look at the .S file.
I'd be quite surprised if the JIT could leverage the vector instructions like gcc does, unfortunately.
Cheers,
Bernard
Cool! Thanks Bernand, I didn't know you could do that. That doubles the speed again.
DeleteMy gcc doesn't have an O4, but does seem to have an Ofast. At any rate, the march=native is the trick.
gcc std=gnu99 efficiencyandprogress.c Ofast march=native o efficiencyandprogress && time ./efficiencyandprogress
sum=250249749750000000
real 0m8.606s
user 0m8.572s
sys 0m0.024s
Even faster to recognize that this is an arithmetic series.
ReplyDeleteThe first term is: 999999*500000 (the sum of the numbers from 0 to 1000000  it is just another arithmetic series)
The last term is: 1000 * 999999*500000
The number of terms: 1000
So the sum is: 1000 * 1001 * 999999*500000 / 2 = 250249749750000000
http://en.wikipedia.org/wiki/Arithmetic_progression#Sum
"the sum of the numbers from 0 to 1000000" > it is from 0 to 999999 of course
ReplyDeleteAnd if it is just about compiler microbenchmarks, this small haskell program runs in 0.4s in my machine:
ReplyDeletemain = print $ sum $ take 1001 [0,sum [0..999999]..]
I made a version more similar to the C one, and it runs the same speed (0.30.4s):
main = print $ sum $ take 1000 $ zipWith (*) [1..] (repeat (sum [0..999999]))
With O0 it is 1.3s
I've managed to reach 140s running time with this code, this one finally forces the program to really make the 1000000 long iteration O(1000) times:
ReplyDeletemain = print $ sum $ take 1000 $ map sum $ zipWith (\x y > map (*x) y) [1..] (repeat [0..999999])
This runs in 0.3 sec on my pc:
ReplyDelete#include
#define N 1000000
int b[N];
int main(void)
{
unsigned int i, count;
long long sum=0;
for(count=0; count<1000; count++){
for (i=0; i< N; i++) {
b[i]+=i;
sum+=b[i];
}
}
printf("sum=%lli\n", sum);
}
time ./fast
sum=250249749750000000
real 0m0.360s
user 0m0.360s
sys 0m0.000s
the version with multiple loops is slower, i think because of cache effects