Python Idiom for applying sequential steps to an iterable -
when doing data processing tasks find myself applying series of compositions, vectorized functions, etc. input iterable of data generate final result. ideally work both lists , generators (in addition other iterable). can think of number of approaches structuring code accomplish this, every way can think of has 1 or more ways feels unclean/unidiomatic me. have outlined below different methods can think of this, question is—is there recommended, idiomatic way this?
methods can think of, illustrated simple example representative of:
write 1 large expression
result = [sum(group) key, group in itertools.groupby( filter(lambda x: x <= 2, [x **2 x in input]), keyfunc=lambda x: x % 3)]
this quite difficult read non-trivial sequence of steps. when reading through code 1 encounters each step in reverse order.
save each step different variable name
squared = [x**2 x in input] filtered = filter(lambda x: x < 2, squared) grouped = itertools.groupby(filtered, keyfunc=lambda x: x % 3) result = [sum(group) key, group in grouped]
this introduces number of local variables can hard name descriptively; additionally, if result of or of intermediate steps large keeping them around wasteful of memory. if 1 wants add step process, care must taken variable names updated correctly—for example, if wished divide every number 2 add line halved = [x / 2.0 x in filtered]
, have remember change filtered
halved
in following line.
store each step same variable name
tmp = [x**2 x in input] tmp = filter(lambda x: x < 2, tmp) tmp = itertools.groupby(tmp, keyfunc=lambda x: x % 3) result = [sum(group) key, group in tmp]
i guess seems me least-bad of these options, storing things in generically named placeholder variable feels un-pythonic me , makes me suspect there better way out there.
code review better place style questions. more problem solving. cr can picky completeness of example.
but can few observations:
if wrap calculation in function, naming isn't such big deal. names don't have globally meaningful.
a number of expressions generators. itertools tends produce generators or gen. expressions. memory use shouldn't of issue.
def better_name(input): squared = (x**2 x in input) # gen expression filtered = filter(lambda x: x < 2, squared) grouped = itertools.groupby(filtered, lambda x: x % 3) result = (sum(group) key, group in grouped) return result list(better_name(input))
using def
functions instead of lambdas can make code clearer. there's trade off. lambdas simple enough i'd keep them.
your 2nd option more readable 1st. order of expressions guides reading , mental evaluation. in 1st it's hard identify inner-most or first evaluation. , groupby
complex operation, in compartmentalizing action welcome.
following filter
docs, these equivalent:
filtered = filter(lambda x: x < 2, squared) filtered = (x x in squared if x<2)
i missing return
. function return generator show, or evaluated list.
groupby
keyfunc
not keyword argument, rather positional one.
groupby
complex function. returns generator produces tuples, element of generator itself. returning makes more obvious.
((key, list(group)) key, group in grouped)
so code style clarifies use desirable.
Comments
Post a Comment