python - Add Subtotals by Group to Pandas Dataframe -
i find myself trying accomplish pandas r data.table. think best way can unambiguously describe want showing analogous r operation:
fruit <- rep(c('apples', 'oranges'), 2) date <- rep(c('2017-07-01', '2017-07-02'), each=2) count <- 1:4 dat <- data.table(date, count, fruit)
the dat variable houses data looks this
date count fruit 1: 2017-07-01 1 apples 2: 2017-07-01 2 oranges 3: 2017-07-02 3 apples 4: 2017-07-02 4 oranges
suppose want add counts date, call "fruit" "total" , add original data. achieve in r, might (i think isn't elegant way, i'm not asking r right now...)
dat.total <- rbind(dat[, list(count=sum(count), fruit='total'), list(date)], dat)
and sure enough dat.total looks this:
date count fruit 1: 2017-07-01 3 total 2: 2017-07-02 7 total 3: 2017-07-01 1 apples 4: 2017-07-01 2 oranges 5: 2017-07-02 3 apples 6: 2017-07-02 4 oranges
so...i'm trying pandas , can't figure out. here's far i've gotten:
import pandas pd fruit = ['apples', 'oranges'] * 2 date = ['2017-07-01', '2017-07-01', '2017-07-02', '2017-07-02'] count = [1, 2, 3, 4] dat = pd.dataframe({'fruit': fruit, 'date': date, 'count': count})
so far, good. here's dat:
count date fruit 0 1 2017-07-01 apples 1 2 2017-07-01 oranges 2 3 2017-07-02 apples 3 4 2017-07-02 oranges
some googling got me far sums each date:
agg = dat.groupby('date').sum()
but problematic. agg seems fundamentally different thing dat. (to put finger on 1 specific piece of mean, agg.index date column, whereas dat.index default rangeindex)
i can't figure out how add "fruit" column "total" each value, , if could, don't know concat these things because of different indices / columns.
seems though approach wrong, i'm having hard time figuring out google.
just clear i'm trying do: i'm hoping pandas datafame similar in spirit r transformation showed @ beginning. guess tl;dr i'm trying go from:
date count fruit 1: 2017-07-01 1 apples 2: 2017-07-01 2 oranges 3: 2017-07-02 3 apples 4: 2017-07-02 4 oranges
to
date count fruit 1: 2017-07-01 3 total 2: 2017-07-02 7 total 3: 2017-07-01 1 apples 4: 2017-07-01 2 oranges 5: 2017-07-02 3 apples 6: 2017-07-02 4 oranges
a first step of agg = dat.groupby('date').sum()
seems promising, have no idea go next or if headed down wrong road.
use combination of groupby
, append
, , assign
df.groupby('date')['count'].sum().reset_index() \ .assign(fruit='total').append(df, ignore_index=true) date count fruit 0 2017-07-01 3 total 1 2017-07-02 7 total 2 2017-07-01 1 apples 3 2017-07-01 2 oranges 4 2017-07-02 3 apples 5 2017-07-02 4 oranges
Comments
Post a Comment