python - Filtering a pandas series using a lambda expression that operates on individual elements -
i'm trying using pandas chaining map , filter operations. i've come across several options, partly outlined in here: pandas how filter series
to summarize,
s = series(range(10)) s.where(s > 4).dropna() s.where(lambda x: x > 4).dropna() s.loc[s > 4] s.loc[lambda x: x > 4] s.to_frame(name='x').query("x > 4")
this fine numerical comparisons , equality checks, doesn't work predicates involving other operations. simple example, consider matching against first character of string.
s = series(['aa', 'ab', 'ba']) s.loc[lambda x: x.startswith('a')] # fails
this fails message "series has no attribute 'startswith'" since argument x
passed lambda expression in second line series itself, rather individual elements contains.
interestingly map
allow element-wise access:
series(list('abcd')).map(lambda x: x.upper()) # results in ['a', 'b', 'c', 'd'] though series has no upper method
while there's clever ways handle startswith
example, i'm hoping find more general solution series can filtered using function accepts individual values collection. , ideally allow chaining operations in,
s = (series(...) .map(...) .where(...) .map(...))
is supported in pandas?
update: scott provided answer cases value string, can handled series.str
described in answer.
but cases series containing objects? there way access attributes or apply functions them?
i guess standard way of managing case de-structure the relevant fields of object data frame, each attribute column. though there might cases want transform collection of objects map , filter(loc/where), without having disassemble complex type dataframe convert back.
i'm partly trying find alternative standard map()/filter() functions in python, operations have nested in reverse.
ie,
map(function3, filter(function2, map(function1, collection)))
use .str
string accessor needed pandas series , string operations.
s = series(['aa', 'ab', 'ba']) s.loc[lambda x: x.str.startswith('a')]
when using map, apply string function each element therefore don't need string accessor.
and @pirsquared's point in comments, don't need lambda @ all, can use boolean indexing.
s = pd.series(['aa', 'ab', 'ba']) s.loc[s.str.startswith('a')]
s.str.startswith
returns true false boolean series when placed in backets series returns values align true.
Comments
Post a Comment