python - Is there any dynamic code for for-loop or any other loop for works on big data? -


data.csv file(sample data)

taluka  crop    village area t1  c1  v1  11 t1  c1  v2  15 t1  c1  v3  3 t1  c1  v4  1 t1  c1  v5  2 t1  c2  v1  12 t1  c2  v2  16 t1  c2  v3  4 t1  c2  v4  100 t1  c2  v5  52 t1  c3  v1  47 t1  c3  v2  15 t1  c3  v3  21 t1  c3  v4  5 t1  c3  v5  7 t1  c4  v1  20 t1  c4  v2  14 t1  c4  v3  18 t1  c4  v4  5 t1  c4  v5  24 t2  c1  v1  21 t2  c1  v2  20 t2  c1  v3  14 t2  c1  v4  7 t2  c1  v5  8 t2  c2  v1  18 t2  c2  v2  3 t2  c2  v3  12 t2  c2  v4  78 t2  c2  v5  56 t2  c3  v1  16 t2  c3  v2  11 t2  c3  v3  15 t2  c3  v2  45 t2  c3  v3  2 t2  c4  v1  3 t2  c4  v2  12 t2  c4  v3  12 t2  c4  v4  44 t2  c4  v5  10 

i want find out,
villages have high risk,medium risk , low risk area particular crop particular taluka.

i have total 500 taluka's , under 500 taluka's there have 10 14 crops , , in each taluka's there 100 200 villages.

so, want find out , taluka-1 (i.e-thane) crop-1(i.e paddy) villages under high risk ,medium risk , low risk. using percentile method.

i have done work. problem code not dynamic. need type each taluka - each crop , there many combinations. so. need dynamically, using loop ( i.e loop, if loop ) stuck on part.

please see code.

import pandas pd import numpy np import matplotlib.pyplot plt  df=pd.read_csv("/home/desktop/data.csv")   df.head()  ##part-1 partition taluka's t1= df[df['taluka'] == 't1'] t2= df[df['taluka'] == 't2']   ##part-2 partition crop wise in each taluka's  t1_c1= t1[t1['crop'] == 'c1'] t1_c2= t1[t1['crop'] == 'c2'] t1_c3= t1[t1['crop'] == 'c3'] t1_c4= t1[t1['crop'] == 'c4']  t2_c1= t2[t2['crop'] == 'c1'] t2_c2= t2[t2['crop'] == 'c2'] t2_c3= t2[t2['crop'] == 'c3'] t2_c4= t2[t2['crop'] == 'c4']   ##descending order t1_c1 = t1_c1.sort('area', ascending=false) t1_c2 = t1_c2.sort('area', ascending=false) t1_c3 = t1_c3.sort('area', ascending=false) t1_c4 = t1_c4.sort('area', ascending=false)  t2_c1 = t2_c1.sort('area', ascending=false) t2_c2 = t2_c2.sort('area', ascending=false) t2_c3 = t2_c3.sort('area', ascending=false) t2_c4 = t2_c4.sort('area', ascending=false)   #####add levels for  each crops in each taluka's  t1_c1['level'] = pd.qcut(t1_c1['area'], 3, ['low risk','medium risk','high risk']) t1_c2['level'] = pd.qcut(t1_c2['area'], 3, ['low risk','medium risk','high risk']) t1_c3['level'] = pd.qcut(t1_c3['area'], 3, ['low risk','medium risk','high risk']) t1_c4['level'] = pd.qcut(t1_c4['area'], 3, ['low risk','medium risk','high risk'])  t2_c1['level'] = pd.qcut(t2_c1['area'], 3, ['low risk','medium risk','high risk']) t2_c2['level'] = pd.qcut(t2_c2['area'], 3, ['low risk','medium risk','high risk']) t2_c3['level'] = pd.qcut(t2_c3['area'], 3, ['low risk','medium risk','high risk']) t2_c4['level'] = pd.qcut(t2_c4['area'], 3, ['low risk','medium risk','high risk'])   print(t1_c1) 

so, here crop c1 , taluka t1 ,which villages in high risk area , low risk area...

how in loop ? have reduce code. , code use 500 taluka's ?

i think need groupby apply , custom function:

def f(x):     labels = ['low risk','medium risk','high risk']     x['level'] = pd.qcut(x['area'].sort_values(ascending=false), 3, labels = labels)     return x   df1 = df.groupby(['taluka','crop']).apply(f) 

print (df1)    taluka crop village  area        level 0      t1   c1      v1    11    high risk 1      t1   c1      v2    15    high risk 2      t1   c1      v3     3  medium risk 3      t1   c1      v4     1     low risk 4      t1   c1      v5     2     low risk 5      t1   c2      v1    12     low risk 6      t1   c2      v2    16  medium risk 7      t1   c2      v3     4     low risk 8      t1   c2      v4   100    high risk 9      t1   c2      v5    52    high risk 10     t1   c3      v1    47    high risk 11     t1   c3      v2    15  medium risk 12     t1   c3      v3    21    high risk 13     t1   c3      v4     5     low risk 14     t1   c3      v5     7     low risk 15     t1   c4      v1    20    high risk 16     t1   c4      v2    14     low risk 17     t1   c4      v3    18  medium risk 18     t1   c4      v4     5     low risk 19     t1   c4      v5    24    high risk 20     t2   c1      v1    21    high risk 21     t2   c1      v2    20    high risk 22     t2   c1      v3    14  medium risk 23     t2   c1      v4     7     low risk 24     t2   c1      v5     8     low risk 25     t2   c2      v1    18  medium risk 26     t2   c2      v2     3     low risk 27     t2   c2      v3    12     low risk 28     t2   c2      v4    78    high risk 29     t2   c2      v5    56    high risk 30     t2   c3      v1    16    high risk 31     t2   c3      v2    11     low risk 32     t2   c3      v3    15  medium risk 33     t2   c3      v2    45    high risk 34     t2   c3      v3     2     low risk 35     t2   c4      v1     3     low risk 36     t2   c4      v2    12  medium risk 37     t2   c4      v3    12  medium risk 38     t2   c4      v4    44    high risk 39     t2   c4      v5    10     low risk 

edit: possible add sort_values last:

df1 = df1.sort_values(['taluka','crop', 'area'], ascending=[true, true, false]) print (df1)    taluka crop village  area        level 1      t1   c1      v2    15    high risk 0      t1   c1      v1    11    high risk 2      t1   c1      v3     3  medium risk 4      t1   c1      v5     2     low risk 3      t1   c1      v4     1     low risk 8      t1   c2      v4   100    high risk 9      t1   c2      v5    52    high risk 6      t1   c2      v2    16  medium risk 5      t1   c2      v1    12     low risk 7      t1   c2      v3     4     low risk 10     t1   c3      v1    47    high risk 12     t1   c3      v3    21    high risk 11     t1   c3      v2    15  medium risk 14     t1   c3      v5     7     low risk 13     t1   c3      v4     5     low risk 19     t1   c4      v5    24    high risk 15     t1   c4      v1    20    high risk 17     t1   c4      v3    18  medium risk 16     t1   c4      v2    14     low risk 18     t1   c4      v4     5     low risk 20     t2   c1      v1    21    high risk 21     t2   c1      v2    20    high risk 22     t2   c1      v3    14  medium risk 24     t2   c1      v5     8     low risk 23     t2   c1      v4     7     low risk 28     t2   c2      v4    78    high risk 29     t2   c2      v5    56    high risk 25     t2   c2      v1    18  medium risk 27     t2   c2      v3    12     low risk 26     t2   c2      v2     3     low risk 33     t2   c3      v2    45    high risk 30     t2   c3      v1    16    high risk 32     t2   c3      v3    15  medium risk 31     t2   c3      v2    11     low risk 34     t2   c3      v3     2     low risk 38     t2   c4      v4    44    high risk 36     t2   c4      v2    12  medium risk 37     t2   c4      v3    12  medium risk 39     t2   c4      v5    10     low risk 35     t2   c4      v1     3     low risk 

or (slowier) sorting in each loop:

def f(x):     labels = ['low risk','medium risk','high risk']     x = x.sort_values('area', ascending=false)     x['level'] = pd.qcut(x['area'], 3, labels = labels)     return x 

df1 = df.groupby(['taluka','crop']).apply(f).reset_index(drop=true) print (df1)    taluka crop village  area        level 0      t1   c1      v2    15    high risk 1      t1   c1      v1    11    high risk 2      t1   c1      v3     3  medium risk 3      t1   c1      v5     2     low risk 4      t1   c1      v4     1     low risk 5      t1   c2      v4   100    high risk 6      t1   c2      v5    52    high risk 7      t1   c2      v2    16  medium risk 8      t1   c2      v1    12     low risk 9      t1   c2      v3     4     low risk 10     t1   c3      v1    47    high risk 11     t1   c3      v3    21    high risk 12     t1   c3      v2    15  medium risk 13     t1   c3      v5     7     low risk 14     t1   c3      v4     5     low risk 15     t1   c4      v5    24    high risk 16     t1   c4      v1    20    high risk 17     t1   c4      v3    18  medium risk 18     t1   c4      v2    14     low risk 19     t1   c4      v4     5     low risk 20     t2   c1      v1    21    high risk 21     t2   c1      v2    20    high risk 22     t2   c1      v3    14  medium risk 23     t2   c1      v5     8     low risk 24     t2   c1      v4     7     low risk 25     t2   c2      v4    78    high risk 26     t2   c2      v5    56    high risk 27     t2   c2      v1    18  medium risk 28     t2   c2      v3    12     low risk 29     t2   c2      v2     3     low risk 30     t2   c3      v2    45    high risk 31     t2   c3      v1    16    high risk 32     t2   c3      v3    15  medium risk 33     t2   c3      v2    11     low risk 34     t2   c3      v3     2     low risk 35     t2   c4      v4    44    high risk 36     t2   c4      v2    12  medium risk 37     t2   c4      v3    12  medium risk 38     t2   c4      v5    10     low risk 39     t2   c4      v1     3     low risk 

Comments

Popular posts from this blog

node.js - Node js - Trying to send POST request, but it is not loading javascript content -

javascript - Replicate keyboard event with html button -

javascript - Web audio api 5.1 surround example not working in firefox -