Designing parallel algorithms on CUDA