python - Is there any dynamic code for for-loop or any other loop for works on big data? -
data.csv file(sample data)
taluka crop village area t1 c1 v1 11 t1 c1 v2 15 t1 c1 v3 3 t1 c1 v4 1 t1 c1 v5 2 t1 c2 v1 12 t1 c2 v2 16 t1 c2 v3 4 t1 c2 v4 100 t1 c2 v5 52 t1 c3 v1 47 t1 c3 v2 15 t1 c3 v3 21 t1 c3 v4 5 t1 c3 v5 7 t1 c4 v1 20 t1 c4 v2 14 t1 c4 v3 18 t1 c4 v4 5 t1 c4 v5 24 t2 c1 v1 21 t2 c1 v2 20 t2 c1 v3 14 t2 c1 v4 7 t2 c1 v5 8 t2 c2 v1 18 t2 c2 v2 3 t2 c2 v3 12 t2 c2 v4 78 t2 c2 v5 56 t2 c3 v1 16 t2 c3 v2 11 t2 c3 v3 15 t2 c3 v2 45 t2 c3 v3 2 t2 c4 v1 3 t2 c4 v2 12 t2 c4 v3 12 t2 c4 v4 44 t2 c4 v5 10
i want find out,
villages have high risk,medium risk , low risk area particular crop particular taluka.
i have total 500 taluka's , under 500 taluka's there have 10 14 crops , , in each taluka's there 100 200 villages.
so, want find out , taluka-1 (i.e-thane) crop-1(i.e paddy) villages under high risk ,medium risk , low risk. using percentile method.
i have done work. problem code not dynamic. need type each taluka - each crop , there many combinations. so. need dynamically, using loop ( i.e loop, if loop ) stuck on part.
please see code.
import pandas pd import numpy np import matplotlib.pyplot plt df=pd.read_csv("/home/desktop/data.csv") df.head() ##part-1 partition taluka's t1= df[df['taluka'] == 't1'] t2= df[df['taluka'] == 't2'] ##part-2 partition crop wise in each taluka's t1_c1= t1[t1['crop'] == 'c1'] t1_c2= t1[t1['crop'] == 'c2'] t1_c3= t1[t1['crop'] == 'c3'] t1_c4= t1[t1['crop'] == 'c4'] t2_c1= t2[t2['crop'] == 'c1'] t2_c2= t2[t2['crop'] == 'c2'] t2_c3= t2[t2['crop'] == 'c3'] t2_c4= t2[t2['crop'] == 'c4'] ##descending order t1_c1 = t1_c1.sort('area', ascending=false) t1_c2 = t1_c2.sort('area', ascending=false) t1_c3 = t1_c3.sort('area', ascending=false) t1_c4 = t1_c4.sort('area', ascending=false) t2_c1 = t2_c1.sort('area', ascending=false) t2_c2 = t2_c2.sort('area', ascending=false) t2_c3 = t2_c3.sort('area', ascending=false) t2_c4 = t2_c4.sort('area', ascending=false) #####add levels for each crops in each taluka's t1_c1['level'] = pd.qcut(t1_c1['area'], 3, ['low risk','medium risk','high risk']) t1_c2['level'] = pd.qcut(t1_c2['area'], 3, ['low risk','medium risk','high risk']) t1_c3['level'] = pd.qcut(t1_c3['area'], 3, ['low risk','medium risk','high risk']) t1_c4['level'] = pd.qcut(t1_c4['area'], 3, ['low risk','medium risk','high risk']) t2_c1['level'] = pd.qcut(t2_c1['area'], 3, ['low risk','medium risk','high risk']) t2_c2['level'] = pd.qcut(t2_c2['area'], 3, ['low risk','medium risk','high risk']) t2_c3['level'] = pd.qcut(t2_c3['area'], 3, ['low risk','medium risk','high risk']) t2_c4['level'] = pd.qcut(t2_c4['area'], 3, ['low risk','medium risk','high risk']) print(t1_c1)
so, here crop c1 , taluka t1 ,which villages in high risk area , low risk area...
how in loop ? have reduce code. , code use 500 taluka's ?
i think need groupby
apply
, custom function:
def f(x): labels = ['low risk','medium risk','high risk'] x['level'] = pd.qcut(x['area'].sort_values(ascending=false), 3, labels = labels) return x df1 = df.groupby(['taluka','crop']).apply(f)
print (df1) taluka crop village area level 0 t1 c1 v1 11 high risk 1 t1 c1 v2 15 high risk 2 t1 c1 v3 3 medium risk 3 t1 c1 v4 1 low risk 4 t1 c1 v5 2 low risk 5 t1 c2 v1 12 low risk 6 t1 c2 v2 16 medium risk 7 t1 c2 v3 4 low risk 8 t1 c2 v4 100 high risk 9 t1 c2 v5 52 high risk 10 t1 c3 v1 47 high risk 11 t1 c3 v2 15 medium risk 12 t1 c3 v3 21 high risk 13 t1 c3 v4 5 low risk 14 t1 c3 v5 7 low risk 15 t1 c4 v1 20 high risk 16 t1 c4 v2 14 low risk 17 t1 c4 v3 18 medium risk 18 t1 c4 v4 5 low risk 19 t1 c4 v5 24 high risk 20 t2 c1 v1 21 high risk 21 t2 c1 v2 20 high risk 22 t2 c1 v3 14 medium risk 23 t2 c1 v4 7 low risk 24 t2 c1 v5 8 low risk 25 t2 c2 v1 18 medium risk 26 t2 c2 v2 3 low risk 27 t2 c2 v3 12 low risk 28 t2 c2 v4 78 high risk 29 t2 c2 v5 56 high risk 30 t2 c3 v1 16 high risk 31 t2 c3 v2 11 low risk 32 t2 c3 v3 15 medium risk 33 t2 c3 v2 45 high risk 34 t2 c3 v3 2 low risk 35 t2 c4 v1 3 low risk 36 t2 c4 v2 12 medium risk 37 t2 c4 v3 12 medium risk 38 t2 c4 v4 44 high risk 39 t2 c4 v5 10 low risk
edit: possible add sort_values
last:
df1 = df1.sort_values(['taluka','crop', 'area'], ascending=[true, true, false]) print (df1) taluka crop village area level 1 t1 c1 v2 15 high risk 0 t1 c1 v1 11 high risk 2 t1 c1 v3 3 medium risk 4 t1 c1 v5 2 low risk 3 t1 c1 v4 1 low risk 8 t1 c2 v4 100 high risk 9 t1 c2 v5 52 high risk 6 t1 c2 v2 16 medium risk 5 t1 c2 v1 12 low risk 7 t1 c2 v3 4 low risk 10 t1 c3 v1 47 high risk 12 t1 c3 v3 21 high risk 11 t1 c3 v2 15 medium risk 14 t1 c3 v5 7 low risk 13 t1 c3 v4 5 low risk 19 t1 c4 v5 24 high risk 15 t1 c4 v1 20 high risk 17 t1 c4 v3 18 medium risk 16 t1 c4 v2 14 low risk 18 t1 c4 v4 5 low risk 20 t2 c1 v1 21 high risk 21 t2 c1 v2 20 high risk 22 t2 c1 v3 14 medium risk 24 t2 c1 v5 8 low risk 23 t2 c1 v4 7 low risk 28 t2 c2 v4 78 high risk 29 t2 c2 v5 56 high risk 25 t2 c2 v1 18 medium risk 27 t2 c2 v3 12 low risk 26 t2 c2 v2 3 low risk 33 t2 c3 v2 45 high risk 30 t2 c3 v1 16 high risk 32 t2 c3 v3 15 medium risk 31 t2 c3 v2 11 low risk 34 t2 c3 v3 2 low risk 38 t2 c4 v4 44 high risk 36 t2 c4 v2 12 medium risk 37 t2 c4 v3 12 medium risk 39 t2 c4 v5 10 low risk 35 t2 c4 v1 3 low risk
or (slowier) sorting in each loop:
def f(x): labels = ['low risk','medium risk','high risk'] x = x.sort_values('area', ascending=false) x['level'] = pd.qcut(x['area'], 3, labels = labels) return x
df1 = df.groupby(['taluka','crop']).apply(f).reset_index(drop=true) print (df1) taluka crop village area level 0 t1 c1 v2 15 high risk 1 t1 c1 v1 11 high risk 2 t1 c1 v3 3 medium risk 3 t1 c1 v5 2 low risk 4 t1 c1 v4 1 low risk 5 t1 c2 v4 100 high risk 6 t1 c2 v5 52 high risk 7 t1 c2 v2 16 medium risk 8 t1 c2 v1 12 low risk 9 t1 c2 v3 4 low risk 10 t1 c3 v1 47 high risk 11 t1 c3 v3 21 high risk 12 t1 c3 v2 15 medium risk 13 t1 c3 v5 7 low risk 14 t1 c3 v4 5 low risk 15 t1 c4 v5 24 high risk 16 t1 c4 v1 20 high risk 17 t1 c4 v3 18 medium risk 18 t1 c4 v2 14 low risk 19 t1 c4 v4 5 low risk 20 t2 c1 v1 21 high risk 21 t2 c1 v2 20 high risk 22 t2 c1 v3 14 medium risk 23 t2 c1 v5 8 low risk 24 t2 c1 v4 7 low risk 25 t2 c2 v4 78 high risk 26 t2 c2 v5 56 high risk 27 t2 c2 v1 18 medium risk 28 t2 c2 v3 12 low risk 29 t2 c2 v2 3 low risk 30 t2 c3 v2 45 high risk 31 t2 c3 v1 16 high risk 32 t2 c3 v3 15 medium risk 33 t2 c3 v2 11 low risk 34 t2 c3 v3 2 low risk 35 t2 c4 v4 44 high risk 36 t2 c4 v2 12 medium risk 37 t2 c4 v3 12 medium risk 38 t2 c4 v5 10 low risk 39 t2 c4 v1 3 low risk
Comments
Post a Comment