Web Scraping - unable to print phone numbers using python and BeautifulSoup -

June 15, 2011

attempting scrape data real-estate agent page project

i able both name , job description all, small number of phone numbers.

this code:

from urllib.request import urlopen ureq bs4 import beautifulsoup soup  my_url = 'https://www.raywhite.com/contact/?type=people&target=people&suburb=sydney%2c+nsw+2000&radius=5&firstname=&lastname=&_so=people'  # opening connection uclient = ureq(my_url) page_html = uclient.read() uclient.close()      page_soup = soup(page_html, "html.parser")  containers = page_soup.findall("div",{"class":"card horizontal-split vcard"})  container in containers:     agent_name = container.findall("li", {"class":"agent-name"})     name = agent_name[0].text      agent_role = container.findall("li", {"class":"agent-role"})     role = agent_role[0].text      phone = container.find("a").text      print("name: " + name)     print("role: " + role)     print("phone: " + phone)

this sample of first couple printed, first 2 agents have phone numbers listed:

name: mark constantine role: principal phone: 0418 222 643 name: dawn veloskey role: operations manager phone: 0418 449 600 name: yvonne lau role: sales phone:  name: anthony cavallaro role: managing director | selling principal phone:  name: ciara oconnor role: sales executive phone:  name: michael buium role: commercial sales manager , auctioneer phone:  name: albert hui role: senior commercial property manager phone:  name: jessie yee role: associate director, commercial leasing & management phone:

not sure why other phone numbers not being printed, suggestions appreciated.

thats because first 2 don't have photographs, otherwise photo first "a" tag.

replace:

phone = container.find("a").text

with:

 filterfn = lambda x: 'href' in x.attrs , x['href'].startswith("tel")  phones = map(lambda x: x.text,filter(filterfn,container.findall("a")))    phone in phones:      print("phone number: " + phone)

Search This Blog

RT

Web Scraping - unable to print phone numbers using python and BeautifulSoup -

Comments

Post a Comment

Popular posts from this blog

Ansible warning on jinja2 braces on when -

Parsing a protocol message from Go by Java -

javascript - Replicate keyboard event with html button -