Скорость Cython vs numpy

Я пытаюсь использовать cython для первого раза. И попытался преобразовать функцию от использования чистой NumPy в CythonСкорость Cython vs numpy

Вот две функции:

from __future__ import division 
import numpy as np 
cimport numpy as np 

DTYPEf = np.float64 
ctypedef np.float64_t DTYPEf_t 

DTYPEi = np.int64 
ctypedef np.int64_t DTYPEi_t 

DTYPEu = np.uint8 
ctypedef np.uint8_t DTYPEu_t 

cimport cython 

@cython.boundscheck(False) 
@cython.wraparound(False) 

def twodcitera(np.ndarray[DTYPEf_t, ndim=3] data, int res, int indexl, int indexu, float radius1, float radius2, output, float height1, float height2): 
''' 
Function to return correlation for fixed radius using Cython 
''' 
cdef float sum_mask = 0 
cdef int i,j,k 
cdef int a, b, c 
cdef np.ndarray[DTYPEi_t, ndim=3] x 
cdef np.ndarray[DTYPEi_t, ndim=3] y 
cdef np.ndarray[DTYPEi_t, ndim=3] z 
cdef np.ndarray[DTYPEu_t, ndim=3, cast=True] R 

a,b,c = res//2,res//2,res//2 
x,y,z = np.ogrid[-a:a,-b:b,-c:c]  

for i in xrange(indexl,indexu): 
    for j in xrange(1): 
    for k in xrange(1): 
     R = np.roll(np.roll(np.roll(np.logical_and(np.logical_or(np.logical_and(z>height1,z<=height2), np.logical_and(z<-height1,z>=-height2)), np.logical_and(x**2 + y**2<= radius2**2, x**2 + y**2 > radius1**2)), (i-a), axis =0), (j-a), axis =1), (k-a), axis =2) 
     sum_mask += (data[i][j][k] * np.average(data[R])) 

output.put(sum_mask)

И для Numpy реализации:

def no_twodcitera(data, res, indexl, indexu, radius1, radius2, output, height1, height2): 
''' 
Function to return correlation for fixed radius 
''' 
a,b,c = res/2,res/2,res/2  
x,y,z = np.ogrid[-a:a,-b:b,-c:c]  
sum_mask = 0 
for i in xrange(indexl,indexu): 
    for j in xrange(1): 
    for k in xrange(1): 
     R = np.roll(np.roll(np.roll(np.logical_and(np.logical_or(np.logical_and(z>height1,z<=height2), np.logical_and(z<-height1,z>=-height2)), np.logical_and(x**2 + y**2<= radius2**2, x**2 + y**2 > radius1**2)), (i-a), axis =0), (j-a), axis =1), (k-a), axis =2) 
     sum_mask += (data[i][j][k] * np.average(data[R])) 

output.put(sum_mask)

Обе функции фактически дает мне то же самое время для завершения ,

%timeit -n200 -r10 twodcitera(dd, tes_res,in1,in2,r[k],r[k+1], output, r[l], r[l+1]) 
200 loops, best of 10: 1.57 ms per loop 

%timeit -n200 -r10 no_twodcitera(dd, tes_res,in1,in2,r[k],r[k+1], output, r[l], r[l+1]) 
200 loops, best of 10: 1.57 ms per loop

Мне было интересно, что я делаю неправильно или я не понял правильно при попытке реализовать cython. Входы:

dd = np.random.randn(64,64,64) 
res = 64 
r = np.arange(0,21,2) 
in1 = 0 
in2 = 1 
l = 5 
k = 7 
output = mp.Queue()

Спасибо, если вы могли бы указать на мое недоразумение здесь.

источник

2015-06-09 MisterJ

xrange для j и k были сохранены 1 только для целей тестирования, в конце концов это будет j в xrange (res) и k в xrange (res) – MisterJ

Вы пытались запустить код с помощью cython -a? http://docs.cython.org/src/quickstart/cythonize.html#determining-where-to-add-types –

И что такое in1, in2 и т. д. –

Не зная ввода и вывода следующего скомпилированного для меня, следующего за cython guide Если вы объясните, как создать тестовый ввод, я мог бы предоставить дополнительную помощь.

EDIT: Моя первая мысль заключалась в том, что, возможно, что-то происходит с компиляцией cython. Но я не мог найти ничего полезного. Этот ответ не очень полезен для решения проблемы скорости. В любом случае я оставляю его здесь для тех, кто заинтересован в тестировании и понимании.

Поместите код в test.pyx

cimport cython 
import numpy as np 
cimport numpy as np 

DTYPEf = np.float64 
ctypedef np.float64_t DTYPEf_t 

DTYPEi = np.int64 
ctypedef np.int64_t DTYPEi_t 

DTYPEu = np.uint8 
ctypedef np.uint8_t DTYPEu_t 


@cython.boundscheck(False) 
@cython.wraparound(False) 
def twodcitera(np.ndarray[DTYPEf_t, ndim=3] data, int res, int indexl, int indexu, float radius1, float radius2, output, float height1, float height2): 
    ''' 
    Function to return correlation for fixed radius using Cython 
    ''' 
    cdef float sum_mask = 0 
    cdef int i,j,k 
    cdef int a, b, c 
    cdef np.ndarray[DTYPEi_t, ndim=3] x 
    cdef np.ndarray[DTYPEi_t, ndim=3] y 
    cdef np.ndarray[DTYPEi_t, ndim=3] z 
    cdef np.ndarray[DTYPEu_t, ndim=3, cast=True] R 
    a,b,c = res//2,res//2,res//2 
    x,y,z = np.ogrid[-a:a,-b:b,-c:c] 
    for i in xrange(indexl,indexu): 
     for j in xrange(1): 
      for k in xrange(1): 
       R = np.roll(np.roll(np.roll(np.logical_and(np.logical_or(np.logical_and(z>height1,z<=height2), np.logical_and(z<-height1,z>=-height2)), np.logical_and(x**2 + y**2<= radius2**2, x**2 + y**2 > radius1**2)), (i-a), axis =0), (j-a), axis =1), (k-a), axis =2) 
       sum_mask += (data[i][j][k] * np.average(data[R])) 
    output.put(sum_mask)

Создать делают файл setup.py и положить

from distutils.core import setup 
from Cython.Build import cythonize 

setup(
    name = "testapp", 
    ext_modules = cythonize('test.pyx'), # accepts a glob pattern 
    )

Перейти к раковине и скомпилировать его:

$python setup.py build_ext --inplace

Go на ipython и попробуйте импортировать:

from test import *

работал для меня, чтобы бежать.

тест скорости показал:

In [28]: %timeit -n200 -r10 no_twodcitera(dd, res,in1,in2,r[k],r[k+1], output, r[l], r[l+1]) 
200 loops, best of 10: 1.29 ms per loop 

In [29]: %timeit -n200 -r10 test.twodcitera(dd, res,in1,in2,r[k],r[k+1], output, r[l], r[l+1]) 
200 loops, best of 10: 1.31 ms per loop

Таким образом, результаты одинаковы, и нет большой разницы. Кроме того, я провел исследование cProfile, чтобы узнать, есть ли что-то, появляющееся во время выполнения стека вызовов. Нужно признаться, что cProfile сложно интерпретировать, когда скорость достигает секунд секунд! Но давайте попробуем.

In [34]: cProfile.run("""no_twodcitera(dd, res,in1,in2,r[k],r[k+1], output, r[l], r[l+1])""") 
     82 function calls in 0.004 seconds 

    Ordered by: standard name 

    ncalls tottime percall cumtime percall filename:lineno(function) 
     1 0.001 0.001 0.004 0.004 <ipython-input-27-663e142d15fb>:1(no_twodcitera) 
     1 0.000 0.000 0.004 0.004 <string>:1(<module>) 
     1 0.000 0.000 0.000 0.000 _methods.py:43(_count_reduce_items) 
     1 0.000 0.000 0.000 0.000 _methods.py:53(_mean) 
     1 0.000 0.000 0.000 0.000 function_base.py:436(average) 
     1 0.000 0.000 0.000 0.000 index_tricks.py:151(__getitem__) 
     3 0.000 0.000 0.002 0.001 numeric.py:1279(roll) 
     1 0.000 0.000 0.000 0.000 numeric.py:394(asarray) 
     4 0.000 0.000 0.000 0.000 numeric.py:464(asanyarray) 
     1 0.000 0.000 0.000 0.000 queues.py:99(put) 
     1 0.000 0.000 0.000 0.000 threading.py:299(_is_owned) 
     1 0.000 0.000 0.000 0.000 threading.py:372(notify) 
     1 0.000 0.000 0.000 0.000 threading.py:63(_note) 
     1 0.000 0.000 0.000 0.000 {hasattr} 
     18 0.000 0.000 0.000 0.000 {isinstance} 
     1 0.000 0.000 0.000 0.000 {issubclass} 
     5 0.000 0.000 0.000 0.000 {len} 
     3 0.000 0.000 0.000 0.000 {math.ceil} 
     1 0.000 0.000 0.000 0.000 {method 'acquire' of '_multiprocessing.SemLock' objects} 
     2 0.000 0.000 0.000 0.000 {method 'acquire' of 'thread.lock' objects} 
     1 0.000 0.000 0.000 0.000 {method 'append' of 'collections.deque' objects} 
     3 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects} 
     1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} 
     1 0.000 0.000 0.000 0.000 {method 'mean' of 'numpy.ndarray' objects} 
     1 0.000 0.000 0.000 0.000 {method 'reduce' of 'numpy.ufunc' objects} 
     1 0.000 0.000 0.000 0.000 {method 'release' of 'thread.lock' objects} 
     3 0.002 0.001 0.002 0.001 {method 'take' of 'numpy.ndarray' objects} 
     9 0.000 0.000 0.000 0.000 {numpy.core.multiarray.arange} 
     5 0.000 0.000 0.000 0.000 {numpy.core.multiarray.array} 
     3 0.000 0.000 0.000 0.000 {numpy.core.multiarray.concatenate} 
     4 0.000 0.000 0.000 0.000 {range} 
     1 0.000 0.000 0.000 0.000 {zip} 



In [35]: cProfile.run("""test.twodcitera(dd, res,in1,in2,r[k],r[k+1], output, r[l], r[l+1])""") 
     82 function calls in 0.003 seconds 

    Ordered by: standard name 

    ncalls tottime percall cumtime percall filename:lineno(function) 
     1 0.000 0.000 0.003 0.003 <string>:1(<module>) 
     1 0.000 0.000 0.000 0.000 _methods.py:43(_count_reduce_items) 
     1 0.000 0.000 0.000 0.000 _methods.py:53(_mean) 
     1 0.000 0.000 0.000 0.000 function_base.py:436(average) 
     1 0.000 0.000 0.000 0.000 index_tricks.py:151(__getitem__) 
     3 0.000 0.000 0.001 0.000 numeric.py:1279(roll) 
     1 0.000 0.000 0.000 0.000 numeric.py:394(asarray) 
     4 0.000 0.000 0.000 0.000 numeric.py:464(asanyarray) 
     1 0.000 0.000 0.000 0.000 queues.py:99(put) 
     1 0.000 0.000 0.000 0.000 threading.py:299(_is_owned) 
     1 0.000 0.000 0.000 0.000 threading.py:372(notify) 
     1 0.000 0.000 0.000 0.000 threading.py:63(_note) 
     1 0.000 0.000 0.000 0.000 {hasattr} 
     18 0.000 0.000 0.000 0.000 {isinstance} 
     1 0.000 0.000 0.000 0.000 {issubclass} 
     5 0.000 0.000 0.000 0.000 {len} 
     3 0.000 0.000 0.000 0.000 {math.ceil} 
     1 0.000 0.000 0.000 0.000 {method 'acquire' of '_multiprocessing.SemLock' objects} 
     2 0.000 0.000 0.000 0.000 {method 'acquire' of 'thread.lock' objects} 
     1 0.000 0.000 0.000 0.000 {method 'append' of 'collections.deque' objects} 
     3 0.000 0.000 0.000 0.000 {method 'append' of 'list' objects} 
     1 0.000 0.000 0.000 0.000 {method 'disable' of '_lsprof.Profiler' objects} 
     1 0.000 0.000 0.000 0.000 {method 'mean' of 'numpy.ndarray' objects} 
     1 0.000 0.000 0.000 0.000 {method 'reduce' of 'numpy.ufunc' objects} 
     1 0.000 0.000 0.000 0.000 {method 'release' of 'thread.lock' objects} 
     3 0.001 0.000 0.001 0.000 {method 'take' of 'numpy.ndarray' objects} 
     9 0.000 0.000 0.000 0.000 {numpy.core.multiarray.arange} 
     5 0.000 0.000 0.000 0.000 {numpy.core.multiarray.array} 
     3 0.000 0.000 0.000 0.000 {numpy.core.multiarray.concatenate} 
     4 0.000 0.000 0.000 0.000 {range} 
     1 0.001 0.001 0.003 0.003 {test.twodcitera} 
     1 0.000 0.000 0.000 0.000 {zip}

К сожалению, ничего не появляется. Я бы пришел к выводу, что причиной может быть то, что numpy уже хорошо реализована и большую часть времени не теряется в вложенных циклах. Кроме того, cPython в основном использует статическую типизацию. Из-за того, что мы используем numpy здесь, это не может быть большой выгодой.

источник

2015-06-10 07:54:15 PlagTag

Проблема не в компиляции, а в производительности. –

Как сказал @moarningsun, я смог скомпилировать, но просто изучал, как я мог улучшить свою работу. Я отредактировал свой вопрос, чтобы четко указать, какие входные данные я использую. Благодарю. – MisterJ

Вы правы, извините за путаницу. Сначала я подумал, что причиной может быть компиляция, но у нее нет данных тестирования. Я также посмотрел профиль выполнения, но его было мало. Возможно, можно попробовать scipy weave, чтобы ускорить работу или реализовать fortran. – PlagTag

Скорость Cython vs numpy

ответ

Смежные вопросы