python - Add Subtotals by Group to Pandas Dataframe -

September 15, 2010

i find myself trying accomplish pandas r data.table. think best way can unambiguously describe want showing analogous r operation:

fruit <- rep(c('apples', 'oranges'), 2) date <- rep(c('2017-07-01', '2017-07-02'), each=2) count <- 1:4  dat <- data.table(date, count, fruit)

the dat variable houses data looks this

         date count   fruit 1: 2017-07-01     1  apples 2: 2017-07-01     2 oranges 3: 2017-07-02     3  apples 4: 2017-07-02     4 oranges

suppose want add counts date, call "fruit" "total" , add original data. achieve in r, might (i think isn't elegant way, i'm not asking r right now...)

dat.total <- rbind(dat[, list(count=sum(count), fruit='total'), list(date)],                    dat)

and sure enough dat.total looks this:

         date count   fruit 1: 2017-07-01     3   total 2: 2017-07-02     7   total 3: 2017-07-01     1  apples 4: 2017-07-01     2 oranges 5: 2017-07-02     3  apples 6: 2017-07-02     4 oranges

so...i'm trying pandas , can't figure out. here's far i've gotten:

import pandas pd  fruit = ['apples', 'oranges'] * 2 date = ['2017-07-01', '2017-07-01', '2017-07-02', '2017-07-02'] count = [1, 2, 3, 4]  dat = pd.dataframe({'fruit': fruit, 'date': date, 'count': count})

so far, good. here's dat:

   count        date    fruit 0      1  2017-07-01   apples 1      2  2017-07-01  oranges 2      3  2017-07-02   apples 3      4  2017-07-02  oranges

some googling got me far sums each date:

agg = dat.groupby('date').sum()

but problematic. agg seems fundamentally different thing dat. (to put finger on 1 specific piece of mean, agg.index date column, whereas dat.index default rangeindex)

i can't figure out how add "fruit" column "total" each value, , if could, don't know concat these things because of different indices / columns.

seems though approach wrong, i'm having hard time figuring out google.

just clear i'm trying do: i'm hoping pandas datafame similar in spirit r transformation showed @ beginning. guess tl;dr i'm trying go from:

         date count   fruit 1: 2017-07-01     1  apples 2: 2017-07-01     2 oranges 3: 2017-07-02     3  apples 4: 2017-07-02     4 oranges

         date count   fruit 1: 2017-07-01     3   total 2: 2017-07-02     7   total 3: 2017-07-01     1  apples 4: 2017-07-01     2 oranges 5: 2017-07-02     3  apples 6: 2017-07-02     4 oranges

a first step of agg = dat.groupby('date').sum() seems promising, have no idea go next or if headed down wrong road.

use combination of groupby, append, , assign

df.groupby('date')['count'].sum().reset_index() \     .assign(fruit='total').append(df, ignore_index=true)           date  count    fruit 0  2017-07-01      3    total 1  2017-07-02      7    total 2  2017-07-01      1   apples 3  2017-07-01      2  oranges 4  2017-07-02      3   apples 5  2017-07-02      4  oranges

Search This Blog

RT

python - Add Subtotals by Group to Pandas Dataframe -

Comments

Post a Comment

Popular posts from this blog

Ansible warning on jinja2 braces on when -

Parsing a protocol message from Go by Java -

node.js - Node js - Trying to send POST request, but it is not loading javascript content -