beautifulsoup - python: select a specific section from a very long div class output -
i pulling info website , output long. how can select key part interested in , assign new object
heres part of code using pull info -
soup = bs(response.text,"html.parser") cartl = soup.find("div",{"class":"product-view"}) cart_link = cartl.find_all("form")
this long output (i shortened down example full text pulls 100 lines) -
<form action="https://www.randomsite.com/checkout/cart/add/uenc/ahr0chm6ly93d3cudghlz29vzhdpbgxvdxquy29tl25pa2utywlylwpvcmrhbi0xmy1yzxryby1izy1oaxn0b3j5lw9mlwzsawdodc13agl0zs1tzxrhbgljlxnpbhzlci11bml2zxjzaxr5lxjlzc00mtq1nzqtmtazp19fx1njrd1v/product/92797/form_key/nblk6ie3lydwf0vh/" id="product_addtocart_form" method="post"> <input name="form_key" type="hidden" value="nblk6ie3lydwf0vh"/> <div class="no-display"> <input name="product" type="hidden" value="92797"/> <input id="related-products-field" name="related_product" type="hidden" value=""/> </div>
i want take add new object- https://www.randomsite.com/checkout/cart/add/uenc/ahr0chm6ly93d3cudghlz29vzhdpbgxvdxquy29tl25pa2utywlylwpvcmrhbi0xmy1yzxryby1izy1oaxn0b3j5lw9mlwzsawdodc13agl0zs1tzxrhbgljlxnpbhzlci11bml2zxjzaxr5lxjlzc00mtq1nzqtmtazp19fx1njrd1v/product/92797/form_key/nblk6ie3lydwf0vh/
this new updated code via answer below that-
from bs4 import beautifulsoup import requests session = requests.session() endpoint = "https://randomsite.com/" response = session.get(endpoint) soup0 = beautifulsoup(response.text,"html.parser") div = soup0.find("div",{"class":"product-view"}) html = div.find("form") soup = beautifulsoup(html, 'html.parser') form = soup.find('form', { 'id': 'product_addtocart_form' }) action = form['action'] print(action)
this new error getting idea on i'm going wrong -
traceback (most recent call last): file "test.py", line 16, in <module> soup = beautifulsoup(html, 'html.parser') file "/library/frameworks/python.framework/versions/3.6/lib/python3.6/site-packages/bs4/__init__.py", line 191, in __init__ markup = markup.read() typeerror: 'nonetype' object not callable
you can use beautifulsoup find
method reference <form>
tag (optionally filtering on particular id
in case there multiple forms on page). then, treat form object dictionary pull action
attribute.
code
from bs4 import beautifulsoup html = ''' <form action="https://www.randomsite.com/checkout/cart/add/uenc/ahr0chm6ly93d3cudghlz29vzhdpbgxvdxquy29tl25pa2utywlylwpvcmrhbi0xmy1yzxryby1izy1oaxn0b3j5lw9mlwzsawdodc13agl0zs1tzxrhbgljlxnpbhzlci11bml2zxjzaxr5lxjlzc00mtq1nzqtmtazp19fx1njrd1v/product/92797/form_key/nblk6ie3lydwf0vh/" id="product_addtocart_form" method="post"> <input name="form_key" type="hidden" value="nblk6ie3lydwf0vh"/> <div class="no-display"> <input name="product" type="hidden" value="92797"/> <input id="related-products-field" name="related_product" type="hidden" value=""/> </div> ''' soup = beautifulsoup(html, 'html.parser') form = soup.find('form', { 'id': 'product_addtocart_form' }) action = form['action'] print action
output
https://www.randomsite.com/checkout/cart/add/uenc/ahr0chm6ly93d3cudghlz29vzhdpbgxvdxquy29tl25pa2utywlylwpvcmrhbi0xmy1yzxryby1izy1oaxn0b3j5lw9mlwzsawdodc13agl0zs1tzxrhbgljlxnpbhzlci11bml2zxjzaxr5lxjlzc00mtq1nzqtmtazp19fx1njrd1v/product/92797/form_key/nblk6ie3lydwf0vh/
Comments
Post a Comment