python - Organizing csv data and computing average grades -
i have sample csv data of classes , grades on 500 rows , looks this
courseid title teacher avggpa students bs cs ds fs 101 math stevens 3.15 105 25.2 45.1 16.7 10.1 2.9 101 math stevens 2.98 95 20.2 30.1 30.5 11.5 5.4 101 math smith 3.33 120 33.1 40.1 10.2 7.6 4.3 103 english jane 3.55 108 20.5 16.2 16.5 20.5 10.2 103 english jane 3.47 100 25.2 38.0 22.0 7.0 2.0 202 science roberts 2.67 80 12.0 35.0 27.5 12.5 8.3
(pretend comma separated, typed formatting purposes. percentages don't add 100% pretend do)
so far have is:
with open(filename, 'rb') f: reader = csv.reader(f, delimiter=',') next(reader, none) #to skip header self.data = list(reader) case_list = [] entry in self.data: case = {'course_number': entry[1], 'course_title': entry[2], 'teacher': entry[3]... #and on each header case_list.append(case)
so have list of dictionaries each dictionary entry 1 row csv file.
my goal combine , average avggpa , as, bs, cs, ds, fs teachers teach same course more once. in example, average grades of steven's , jane's classes, , represent visual. if teacher teaches 1 course, represent grades visual.
i'm struggling coming method determine if teacher teaches more 1 course. along lines of looping through list , checking if courseid , teacher in dictionary, , calling function average gpas if so, can't seem think out logic.
any appreciated , if more clarification needed please let me know. if there better approach organizing csv data did, please let me know!
first of all, remember lists start indexing @ 0 line in add each entry off 1 in each field. start at
entry[0]
anyways, you've organised list of dicts, each dict represents stats of given course. purposes, better organise information in single dict key teacher's name , course id , value total stats course. that, initialise empty dict , iterate throw rows of csv, checking if there entry in dict given teacher/courseid , updating if or adding if not. this:
stats = {} entry in self.data: # convert type entry[3] = float(entry[3]) entry[4] = float(entry[4]) # check if teacher in dict if not entry[0] + entry[2] in stats: # add new row stats[entry[0] + entry[2]] = {'total_students':entry[4], 'weighted_gpa':entry[4]*entry[3]} else: # update row stats[entry[0] + entry[2]]['weighted_gpa'] = stats[entry[0] + entry[2]]['weighted_gpa'] + entry[4]*entry[3] stats[entry[0] + entry[2]]['total_students'] = stats[entry[0] + entry[2]]['total_students'] + entry[4]
then can run through dictionary , average gpas:
for teachercourse in stats: teachercourse['avg_gpa'] = teachercourse['weighted_gpa'] / teachercourse['total_students']
i kept average gpas readability add 'total_weighted_number_of_as' etc full list of desired stats
Comments
Post a Comment