python - numpy: cumulative multiplicity count -
i have sorted array of ints might have repetitions. count consecutive equal values, restarting 0 when value different previous one. expected result implemented simple python loop:
import numpy np def count_multiplicities(a): r = np.zeros(a.shape, dtype=a.dtype) in range(1, len(a)): if a[i] == a[i-1]: r[i] = r[i-1]+1 else: r[i] = 0 return r = (np.random.rand(20)*5).astype(dtype=int) a.sort() print "given sorted array: ", print "multiplicity count: ", count_multiplicities(a) output:
given sorted array: [0 0 0 0 0 1 1 1 2 2 2 2 3 3 3 3 4 4 4 4] multiplicity count: [0 1 2 3 4 0 1 2 0 1 2 3 0 1 2 3 0 1 2 3] how can same result in efficient way using numpy? array long, repetitions few (say no more ten).
in special case know values start 0 , difference between consecutive values either 0 or 1 (no gaps in values).
here's 1 cumsum based vectorized approach -
def count_multiplicities_cumsum_vectorized(a): out = np.ones(a.size,dtype=int) idx = np.flatnonzero(a[1:] != a[:-1])+1 out[idx[0]] = -idx[0] + 1 out[0] = 0 out[idx[1:]] = idx[:-1] - idx[1:] + 1 np.cumsum(out, out=out) return out sample run -
in [58]: out[58]: array([0, 0, 0, 0, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 3, 3, 3, 4, 4, 4]) in [59]: count_multiplicities(a) # original approach out[59]: array([0, 1, 2, 3, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 0, 1, 2]) in [60]: count_multiplicities_cumsum_vectorized(a) out[60]: array([0, 1, 2, 3, 0, 1, 2, 3, 4, 0, 1, 2, 3, 4, 0, 1, 2, 0, 1, 2]) runtime test -
in [66]: = (np.random.rand(200000)*1000).astype(dtype=int) ...: a.sort() ...: in [67]: out[67]: array([ 0, 0, 0, ..., 999, 999, 999]) in [68]: %timeit count_multiplicities(a) 10 loops, best of 3: 87.2 ms per loop in [69]: %timeit count_multiplicities_cumsum_vectorized(a) 1000 loops, best of 3: 739 µs per loop
Comments
Post a Comment