Home > SURF > Random Generation isn’t that slow?

Random Generation isn’t that slow?

July 14th, 2009 Stephen Larew

I decided I would try to skip the use of global memory and directly implement the md5 hash random generator in a device function that is callled by the GenerateNullHypothesisKernel kernel.  This would skip the use of the global memory and possibly speed up the random generator and provide actual speedup.  Unfortunately, things became confusing.

I implemented this and the null hypothesis generation sped up considerably but the scanning of data slowed down!  I realized this was because of the way kernel launches are handled.  A kernel launch is an asynchronous call so in order to time a kernel’s runtime individually, you must add a cudaThreadSynchronize() call right after the kernel launch to wait for the kernel to finish.

I need to test this again, but it would seem that random number generation isn’t the culprit and it is instead actually the scanning and calculation of likelihoods that is slow.  I will test this again in the morning.

I plan to and am implementing a different way of calculating likelihoods.  I will do a sum-scan on the window data which will allow each thread to calculate just one likelihood.  There will be one thread per window instead of per grid point.  I suspect this will be faster.  We will see though.  It should provide the maximum amount of concurrency though.

Categories: SURF Tags:
Comments are closed.