How can I speed up this python script to read and process a csv file? -
i trying process relatively large (about 100k lines) csv file in python. code looks like:
#!/usr/bin/env python import sys reload(sys) sys.setdefaultencoding("utf8") import csv import os csvfilename = sys.argv[1] open(csvfilename, 'r') inputfile: parsedfile = csv.dictreader(inputfile, delimiter=',') totalcount = 0 row in parsedfile: target = row['new'] source = row['old'] systemline = "some_curl_command {source}, {target}".format(source = source, target = target) os.system(systemline) totalcount += 1 print "\nprocessed number: " + str(totalcount)
i'm not sure how optimize script. should use besides dictreader?
i have use python 2.7, , cannot upgrade python 3.
if want avoid multiprocessing possible split long csv file few smaller csvs , run them simultaneously.
$ python your_script.py 1.csv & $ python your_script.py 2.csv &
ampersand stands background execution in linux envs. more details here. don't have enough knowledge similar in windows, it's possible open few cmd windows, lol.
anyway it's better stick multiprocessing, ofc.
what use
requests
instead of curl?import requests response = requests.get(source_url) html = response.content open(target, "w") file: file.write(html)
- avoid print statements, in long-term run they're slow hell. development , debugging that's ok, when decide start final execution of script can remove , check count of processed files directly in target folder.
Comments
Post a Comment