Web Scraping - unable to print phone numbers using python and BeautifulSoup -
attempting scrape data real-estate agent page project
i able both name , job description all, small number of phone numbers.
this code:
from urllib.request import urlopen ureq bs4 import beautifulsoup soup my_url = 'https://www.raywhite.com/contact/?type=people&target=people&suburb=sydney%2c+nsw+2000&radius=5&firstname=&lastname=&_so=people' # opening connection uclient = ureq(my_url) page_html = uclient.read() uclient.close() page_soup = soup(page_html, "html.parser") containers = page_soup.findall("div",{"class":"card horizontal-split vcard"}) container in containers: agent_name = container.findall("li", {"class":"agent-name"}) name = agent_name[0].text agent_role = container.findall("li", {"class":"agent-role"}) role = agent_role[0].text phone = container.find("a").text print("name: " + name) print("role: " + role) print("phone: " + phone)
this sample of first couple printed, first 2 agents have phone numbers listed:
name: mark constantine role: principal phone: 0418 222 643 name: dawn veloskey role: operations manager phone: 0418 449 600 name: yvonne lau role: sales phone: name: anthony cavallaro role: managing director | selling principal phone: name: ciara oconnor role: sales executive phone: name: michael buium role: commercial sales manager , auctioneer phone: name: albert hui role: senior commercial property manager phone: name: jessie yee role: associate director, commercial leasing & management phone:
not sure why other phone numbers not being printed, suggestions appreciated.
thats because first 2 don't have photographs, otherwise photo first "a" tag.
replace:
phone = container.find("a").text
with:
filterfn = lambda x: 'href' in x.attrs , x['href'].startswith("tel") phones = map(lambda x: x.text,filter(filterfn,container.findall("a"))) phone in phones: print("phone number: " + phone)
Comments
Post a Comment