Python Idiom for applying sequential steps to an iterable -


when doing data processing tasks find myself applying series of compositions, vectorized functions, etc. input iterable of data generate final result. ideally work both lists , generators (in addition other iterable). can think of number of approaches structuring code accomplish this, every way can think of has 1 or more ways feels unclean/unidiomatic me. have outlined below different methods can think of this, question is—is there recommended, idiomatic way this?

methods can think of, illustrated simple example representative of:

write 1 large expression

result = [sum(group)            key, group in itertools.groupby(               filter(lambda x: x <= 2, [x **2 x in input]),                keyfunc=lambda x: x % 3)] 

this quite difficult read non-trivial sequence of steps. when reading through code 1 encounters each step in reverse order.

save each step different variable name

squared = [x**2 x in input] filtered = filter(lambda x: x < 2, squared) grouped = itertools.groupby(filtered, keyfunc=lambda x: x % 3) result = [sum(group) key, group in grouped] 

this introduces number of local variables can hard name descriptively; additionally, if result of or of intermediate steps large keeping them around wasteful of memory. if 1 wants add step process, care must taken variable names updated correctly—for example, if wished divide every number 2 add line halved = [x / 2.0 x in filtered], have remember change filtered halved in following line.

store each step same variable name

tmp = [x**2 x in input] tmp = filter(lambda x: x < 2, tmp) tmp = itertools.groupby(tmp, keyfunc=lambda x: x % 3) result = [sum(group) key, group in tmp] 

i guess seems me least-bad of these options, storing things in generically named placeholder variable feels un-pythonic me , makes me suspect there better way out there.

code review better place style questions. more problem solving. cr can picky completeness of example.

but can few observations:

  • if wrap calculation in function, naming isn't such big deal. names don't have globally meaningful.

  • a number of expressions generators. itertools tends produce generators or gen. expressions. memory use shouldn't of issue.


def better_name(input):    squared = (x**2 x in input)   # gen expression    filtered = filter(lambda x: x < 2, squared)    grouped = itertools.groupby(filtered, lambda x: x % 3)    result = (sum(group) key, group in grouped)    return result  list(better_name(input)) 

using def functions instead of lambdas can make code clearer. there's trade off. lambdas simple enough i'd keep them.

your 2nd option more readable 1st. order of expressions guides reading , mental evaluation. in 1st it's hard identify inner-most or first evaluation. , groupby complex operation, in compartmentalizing action welcome.


following filter docs, these equivalent:

filtered = filter(lambda x: x < 2, squared) filtered = (x x in squared if x<2) 

i missing return. function return generator show, or evaluated list.

groupby keyfunc not keyword argument, rather positional one.

groupby complex function. returns generator produces tuples, element of generator itself. returning makes more obvious.

((key, list(group)) key, group in grouped) 

so code style clarifies use desirable.


Comments

Popular posts from this blog

node.js - Node js - Trying to send POST request, but it is not loading javascript content -

javascript - Replicate keyboard event with html button -

javascript - Web audio api 5.1 surround example not working in firefox -