Searching CSV files with Pandas (unique id's) - Python -
i looking searching csv file 242000 rows , want sum unique identifiers in 1 of columns. column name 'logid' , has number of different values i.e. 1002, 3004, 5003. want search csv file using panda dataframe , sum amount of unique identifiers. if possible create new csv file stores information. example if find there 50 logid's of 1004 create csv file has column name 1004 , count of 50 displayed below. unique identifiers , add them in same csv file. new @ , have done searching no idea start.
thanks!
as haven't post code can give answer general way work.
- load csv file pd.dataframe using pandas.read_csv
save values occurence > 1 in seperate df1 using pandas.dataframe.drop_duplicates like:
df1=df.drop_duplicates(keep="first)
--> return dataframe contains rows first occurence of duplicate values. e.g. if value 1000 in 5 rows first row returned while others dropped.
--> applying df1.shape[0] return number of duplicate values in df.
3.if want store rows of df contain "duplicate value" in seperate csv file have smth this:
df=pd.dataframe({"a":[0,1,2,3,0,1,2,5,5]}) # should represent original data set print(df) df1=df.drop_duplicates(subset="a",keep="first") #i assume column duplicate values columns "a" if want check whole row omit subset keyword. print(df1) list=[] m in df1["a"]: mask=(df==m) list.append(df[mask].dropna()) dfx in range(len(list)): name="file{0}".format(dfx) list[dfx].to_csv(r"your path\{0}".format(name))
Comments
Post a Comment