How can I speed up this python script to read and process a csv file? -

April 15, 2011

i trying process relatively large (about 100k lines) csv file in python. code looks like:

#!/usr/bin/env python  import sys reload(sys) sys.setdefaultencoding("utf8") import csv import os  csvfilename = sys.argv[1]   open(csvfilename, 'r') inputfile:     parsedfile = csv.dictreader(inputfile, delimiter=',')      totalcount = 0      row in parsedfile:          target = row['new']          source = row['old']          systemline = "some_curl_command {source}, {target}".format(source = source, target = target)          os.system(systemline)          totalcount += 1          print "\nprocessed number: " + str(totalcount)

i'm not sure how optimize script. should use besides dictreader?

i have use python 2.7, , cannot upgrade python 3.

if want avoid multiprocessing possible split long csv file few smaller csvs , run them simultaneously.
```
$ python your_script.py 1.csv & $ python your_script.py 2.csv &  
```

ampersand stands background execution in linux envs. more details here. don't have enough knowledge similar in windows, it's possible open few cmd windows, lol.

anyway it's better stick multiprocessing, ofc.

what use requests instead of curl?

import requests response = requests.get(source_url) html = response.content open(target, "w") file:     file.write(html)

here's doc.

avoid print statements, in long-term run they're slow hell. development , debugging that's ok, when decide start final execution of script can remove , check count of processed files directly in target folder.

Search This Blog

RT

How can I speed up this python script to read and process a csv file? -

Comments

Post a Comment

Popular posts from this blog

python - Selenium remoteWebDriver (& SauceLabs) Firefox moseMoveTo action exception -

html - How to custom Bootstrap grid height? -

Ansible warning on jinja2 braces on when -