How can I speed up this python script to read and process a csv file? -


i trying process relatively large (about 100k lines) csv file in python. code looks like:

#!/usr/bin/env python  import sys reload(sys) sys.setdefaultencoding("utf8") import csv import os  csvfilename = sys.argv[1]   open(csvfilename, 'r') inputfile:     parsedfile = csv.dictreader(inputfile, delimiter=',')      totalcount = 0      row in parsedfile:          target = row['new']          source = row['old']          systemline = "some_curl_command {source}, {target}".format(source = source, target = target)          os.system(systemline)          totalcount += 1          print "\nprocessed number: " + str(totalcount) 

i'm not sure how optimize script. should use besides dictreader?

i have use python 2.7, , cannot upgrade python 3.

  1. if want avoid multiprocessing possible split long csv file few smaller csvs , run them simultaneously.

    $ python your_script.py 1.csv & $ python your_script.py 2.csv &  

ampersand stands background execution in linux envs. more details here. don't have enough knowledge similar in windows, it's possible open few cmd windows, lol.

anyway it's better stick multiprocessing, ofc.

  1. what use requests instead of curl?

    import requests response = requests.get(source_url) html = response.content open(target, "w") file:     file.write(html) 

here's doc.

  1. avoid print statements, in long-term run they're slow hell. development , debugging that's ok, when decide start final execution of script can remove , check count of processed files directly in target folder.

Comments

Popular posts from this blog

node.js - Node js - Trying to send POST request, but it is not loading javascript content -

javascript - Replicate keyboard event with html button -

javascript - Web audio api 5.1 surround example not working in firefox -