python - Count URL's in console instead of progress bar -
i run progress bar part of web-scraper appears both
(a) inaccurate (b) slow process is.
with click.progressbar(range(1000000)) bar: in bar: pass is there article/training able read better understand printing progress console?
i want program scan url in list , print progress iterates through list, along lines of
scanning url 1 of 30
scanning url 2 of 30
scanning url 3 of 30
if possible, keep on same line not essential.
code below -- if assist either training or reading, appreciated.
import requests import csv lxml import html url_list = [ "https://www.realestate.com.au/property/1-1-goldsmith-st-elwood-vic-3184", "https://www.realestate.com.au/property/1-10-albion-rd-glen-iris-vic-3146", "https://www.realestate.com.au/property/1-109-sydney-rd-manly-nsw-2095", "https://www.realestate.com.au/property/1-1110-glen-huntly-rd-glen-huntly-vic-3163",] open('test.csv', 'wb') csv_file: writer = csv.writer(csv_file) index, url in enumerate(url_list): page = requests.get(url) print 'scanning url....' if text2search in page.text: tree = html.fromstring(page.content) (title,) = (x.text_content() x in tree.xpath('//title')) (price,) = (x.text_content() x in tree.xpath('//div[@class="property-value__price"]')) (sold,) = (x.text_content().strip() x in tree.xpath('//p[@class="property-value__agent"]')) writer.writerow([title, price, sold])
if want print indicator other progress bar show how far along are, easiest regular prints.
since code in question python 2, answered python 2 code, question come python 3 users, i've added section them too.
a version python 2
the following based on , should complement code in question:
for index, url in enumerate(url_list): print 'scanning url #' + str(index+1) + ' of ' + str(len(url_list)) you can optionally add url you're scanning using url variable for loop generates.
also, if want have each print replace last, can add comma , end of print statement, , add \r character beginning:
for index, url in enumerate(url_list): print '\rscanning url #' + str(index+1) + ' of ' + str(len(url_list)), the comma prevents print adding new line character (\n) end, , \r ("carriage return") @ beginning erases what's on line before printing rest of line.
differences in print between python 2 & python 3
it's important note print functions quite differently in python 2 , python 3. above 'python 2' solution not work in python 3.
for 1 thing, print in python 3 function, not keyword, has called function (i.e. print('print me!')), , secondly, adding comma end not prevent output of new line character. normally including comma @ end have no visible effect, interpreter is evaluating (as tuple containing single none) can seen when using python repl. instead, 1 must supply named argument (named end) print function override it's default.
a version python 3
here's python 3 equivalent code supplied @ top of answer:
for index, url in enumerate(url_list): print('scanning url #' + str(index+1) + ' of ' + str(len(url_list))) and if want have each print reuse same line, second example above:
for index, url in enumerate(url_list): print('\rscanning url #' + str(index+1) + ' of ' + str(len(url_list)), end='') in case didn't read above, please note end='' overriding print function's default action of adding \n (newline) character end of each line adds empty string instead, , \r (carriage return) character @ beginning of string causes python go beginning of line print rest of string.
Comments
Post a Comment