Searching CSV files with Pandas (unique id's)

Searching CSV files with Pandas (unique id's) - Python -

April 15, 2014

i looking searching csv file 242000 rows , want sum unique identifiers in 1 of columns. column name 'logid' , has number of different values i.e. 1002, 3004, 5003. want search csv file using panda dataframe , sum amount of unique identifiers. if possible create new csv file stores information. example if find there 50 logid's of 1004 create csv file has column name 1004 , count of 50 displayed below. unique identifiers , add them in same csv file. new @ , have done searching no idea start.

thanks!

as haven't post code can give answer general way work.

load csv file pd.dataframe using pandas.read_csv
save values occurence > 1 in seperate df1 using pandas.dataframe.drop_duplicates like:

df1=df.drop_duplicates(keep="first)

--> return dataframe contains rows first occurence of duplicate values. e.g. if value 1000 in 5 rows first row returned while others dropped.

--> applying df1.shape[0] return number of duplicate values in df.

3.if want store rows of df contain "duplicate value" in seperate csv file have smth this:

df=pd.dataframe({"a":[0,1,2,3,0,1,2,5,5]}) # should represent original data set print(df)  df1=df.drop_duplicates(subset="a",keep="first") #i assume column duplicate values columns "a" if want check whole row omit subset keyword.  print(df1) list=[]   m in df1["a"]:     mask=(df==m)     list.append(df[mask].dropna())  dfx in range(len(list)):     name="file{0}".format(dfx)     list[dfx].to_csv(r"your path\{0}".format(name))

Search This Blog

RT

Searching CSV files with Pandas (unique id's) - Python -

Comments

Post a Comment

Popular posts from this blog

Ansible warning on jinja2 braces on when -

Parsing a protocol message from Go by Java -

node.js - Node js - Trying to send POST request, but it is not loading javascript content -