python - findChildren() method is storing two of the same child rather than one -
from webpage opening urllib2
, scraping beautifulsoup
, trying store specific text within webpage.
before see code, here link screenshot of html webpage can understand way using find
function beautifulsoup
:
and finally, here code using:
from beautifulsoup import beautifulsoup import urllib2 url = 'http://www.sciencekids.co.nz/sciencefacts/animals/bird.html' page = urllib2.urlopen(url) soup = beautifulsoup(page.read()) ul = soup.find('ul', {'class': 'style33'}) children = ul.findchildren() child in children: print child.text
and here output problem lies:
birds have feathers, wings, lay eggs , warm blooded. birds have feathers, wings, lay eggs , warm blooded. there around 10000 different species of birds worldwide. there around 10000 different species of birds worldwide. ostrich largest bird in world. lays largest eggs , has fastest maximum running speed (97 kph). ostrich largest bird in world. lays largest eggs , has fastest maximum running speed (97 kph). scientists believe birds evolved theropod dinosaurs. scientists believe birds evolved theropod dinosaurs. birds have hollow bones them fly. birds have hollow bones them fly. bird species intelligent enough create , use tools. bird species intelligent enough create , use tools. chicken common species of bird found in world. chicken common species of bird found in world. kiwis endangered, flightless birds live in new zealand. lay largest eggs relative body size of bird in world. kiwis endangered, flightless birds live in new zealand. lay largest eggs relative body size of bird in world. hummingbirds can fly backwards. hummingbirds can fly backwards. bee hummingbird smallest living bird in world, length of 5 cm (2 in). bee hummingbird smallest living bird in world, length of 5 cm (2 in). around 20% of bird species migrate long distances every year. around 20% of bird species migrate long distances every year. homing pigeons bred find way home long distances away , have been used thousands of years carry messages. homing pigeons bred find way home long distances away , have been used thousands of years carry messages.
is there using incorrectly and/or doing incorrectly in code making there 2 children there should one? easy create code don't store duplicates of same information, i'd rather right way 1 of each string looking for.
children = ul.findchildren()
selecting both <li>
, <p>
within <ul>
. iterating on children
causing print text
property of both of these elements. fix this, change children = ul.findchildren()
children = ul.findchildren("p")
.
Comments
Post a Comment