python - How to efficiently use webdriver to scrape and store it into a dataframe? -
i have data-frame hundreds of urls. try visit each url selenium , store source code in corresponding series. however, seems slow start web driver each visit. thinking starting web driver before apply function, don't know how pass 'driver' apply function. there better way this?
def selenium_download (url): driver = webdriver.phantomjs() try: driver.set_page_load_timeout(10) driver.get(url) source_page = driver.page_source driver.quit() except: driver.quit() source_page = 'time-out' return source_page def revisit_urls(df): df['source_page'] = df['page_url'].apply(selenium_download)
Comments
Post a Comment