web scraping - Using requests with a simple form on Python -
i'm trying scrape example sentences specific french word using python, page python doesn't seem have results.
i've inspected element of search box , search button , included them parameters. perhaps i'm missing something?
http://www.online-languages.info/french/examples.php
import requests bs4 import beautifulsoup word = 'manger' url='http://www.online-languages.info/french/examples.php' params ={'word':word,'go':''} response=requests.post(url, data=params) soup = beautifulsoup(response.text, 'html5lib') print(soup.prettify())
edit: here output of result. appears may using javascript. if that's case, have different library use?
<!doctype html public "-//w3c//dtd xhtml 1.0 transitional//en" "http://www.w3.org/tr/xhtml1/dtd/xhtml1-transitional.dtd"> <html dir="ltr" lang="en" xml:lang="en" xmlns="http://www.w3.org/1999/xhtml"> <head> <title> french example sentences :: online-languages.info </title> <meta content="text/css" http-equiv="content-style-type"/> <meta content="text/html; charset=utf-8" http-equiv="content-type"/> <meta content="database containing thousands of example sentences. sentences important learning correct use of words." name="description"/> <meta content="french language. french grammar. french vocabulary. tests. language certificate. verbs. french phrases. french pronunciation. e-learning. conversation." name="subject"/> <meta content="french, french grammar, french dictionary, french vocabulary, french language, tests, french test, exam, fce, verbs, exercise, certificate, course, games" name="keywords"/> <link href="../style.css" rel="stylesheet" type="text/css"/> </head> <body style="background-image:url(./img/bg2.jpg);"> <div align="center"> <table bgcolor="white" border="0" cellpadding="6" cellspacing="0" style="-moz-border-radius:20px;" width="1000"> <tbody> <tr> <td align="center" colspan="4"> <table border="0" cellspacing="0" width="100%"> <tbody> <tr> <td align="center" width="180"> <a href="../"> <img alt="online-languages.info" border="0" src="img/logo.png"/> </a> </td> <td align="left" style="background: url('img/bg.png'); -moz-border-radius:20px; padding: 20px 20px 20px 20px; "> <h1 style="color:#fff; font-size:20pt;"> french words in example sentences </h1> <h3 style="color:#fff; font-size:8pt; font-weight:normal;"> french language resources @ <a href="http://www.online-languages.info" style="color:white;"> online-languages.info </a> </h3> </td> </tr> </tbody> </table> </td> </tr> <tr> <td align="left" valign="top" width="180"> <table cellpadding="0" cellspacing="0" class="t2" width="180"> <tbody> <tr> <td> <a class="arect" href="index.php"> home </a> </td> </tr> <tr> <td> <a class="arect" href="grammar.php"> french grammar </a> </td> </tr> <tr> <td> <a class="arect" href="phrases.php"> french phrases </a> </td> </tr> <tr> <td> <a class="arect" href="vocabulary.php"> french vocabulary </a> </td> </tr> <tr> <td> <a class="arect" href="trainer.php"> vocabulary trainer </a> </td> </tr> <tr> <td> <a class="arect" href="picture-dictionary.php"> picture dictionary </a> </td> </tr> <tr> <td> <a class="arect" href="dictionary.php"> french dictionary </a> </td> </tr> <tr> <td> <a class="arect" href="flashcards.php"> flashcards </a> </td> </tr> <tr> <td> <a class="arect" href="audio.php"> audio </a> </td> </tr> <tr> <td> <a class="arect" href="video.php"> video </a> </td> </tr> <tr> <td> <a class="arect" href="translator.php"> french translator </a> </td> </tr> <tr> <td> <a class="arect" href="tests.php"> french quizzes </a> </td> </tr> <tr> <td> <a class="arect" href="examples.php"> examples of use </a> </td> </tr> <tr> <td> <a class="arect" href="pronunciation.php"> french pronunciation </a> </td> </tr> <tr> <td> <a class="arect" href="news.php"> news in french </a> </td> </tr> <tr> <td> <a class="arect" href="applications.php"> language software </a> </td> </tr> <tr> <td> <a class="arect" href="mobile.php"> mobile phones </a> </td> </tr> </tbody> </table> <img alt="" border="0" height="0" src="http://whos.amung.us/swidget/fnhahzdo0ncz.gif" style="display:none;" width="0"/> </td> <td align="left" bgcolor="#ffffff" valign="top" width="90%"> <script type="text/javascript"> <!-- google_ad_client = "ca-pub-7058441231119392"; /* online-languages */ google_ad_slot = "3704078504"; google_ad_width = 728; google_ad_height = 90; //--> </script> <script src="http://pagead2.googlesyndication.com/pagead/show_ads.js" type="text/javascript"> </script> <br/> <br/> <div align="justify"> <div id="content"> <iframe frameborder="0" height="650" src="http://www.dicts.info/examples.php?lang=french&disa=1" width="95%"> </iframe> </div> </div> <!-- cookieconsent2 silktide --> <script type="text/javascript"> window.cookieconsent_options = { learnmore: 'more info', message: 'this website uses cookies personalize content , improve experience on our website.', link: 'https://www.google.com/policies/technologies/cookies/', theme: 'light-bottom' }; </script> <script src="https://s3.amazonaws.com/cc.silktide.com/cookieconsent.latest.min.js" type="text/javascript"> </script> <noscript> <p>we recommend enable javascript take full advantage of website.</p> </noscript> </td> </tr> </tbody> </table> <br/> <table width="700"> <tbody> <tr> <td align="center"> <a href="../english"> <img alt="" border="0" height="60" src="http://fimg.seznam.cz/?spec=ft100x75&url=http://www.jazyky-online.info/anglictina"/> <br/> english </a> </td> <td align="center"> <a href="../german"> <img alt="" border="0" height="60" src="http://fimg.seznam.cz/?spec=ft100x75&url=http://www.jazyky-online.info/spanelstina"/> <br/> german </a> </td> <td align="center"> <a href="../french"> <img alt="" border="0" height="60" src="http://fimg.seznam.cz/?spec=ft100x75&url=http://www.jazyky-online.info/francouzstina"/> <br/> french </a> </td> <td align="center"> <a href="../spanish"> <img alt="" border="0" height="60" src="http://fimg.seznam.cz/?spec=ft100x75&url=http://www.jazyky-online.info/spanelstina"/> <br/> spanish </a> </td> <td align="center"> <a href="../russian"> <img alt="" border="0" height="60" src="http://fimg.seznam.cz/?spec=ft100x75&url=http://www.jazyky-online.info/rustina"/> <br/> russian </a> </td> <td align="center"> <a href="../chinese"> <img alt="" border="0" height="60" src="http://fimg.seznam.cz/?spec=ft100x75&url=http://www.jazyky-online.info/cinstina"/> <br/> chinese </a> </td> </tr> </tbody> </table> <br/> <br/> <table cellpadding="10" style="background:url(img/bgfoot.jpg);" width="100%"> <tbody> <tr> <td align="center"> <font color="#0000aa"> <a href="../licence.html"> licence </a> | <a href="../licence.html"> terms of use </a> | <a href="../licence.html#disclaimer"> disclaimer </a> | <a href="../licence.html#privacy"> privacy policy </a> | <a href="http://www.dicts.info/contact.php?s=online-languages"> contact </a> </font> <br/> copyright © 2007-2017, online-languages.info </td> </tr> </tbody> </table> </div> <script type="text/javascript"> var gajshost = (("https:" == document.location.protocol) ? "https://ssl." : "http://www."); document.write(unescape("%3cscript src='" + gajshost + "google-analytics.com/ga.js' type='text/javascript'%3e%3c/script%3e")); </script> <script type="text/javascript"> try { var pagetracker = _gat._gettracker("ua-8795372-1"); pagetracker._trackpageview(); } catch(err) {} </script> </body> </html>
this works me. notice used get
method , uri referenced in actual form on page.
import requests word = 'manger' url ='http://www.dicts.info/examples.php' headers = {'referer': 'http://www.dicts.info/examples.php?disa=1&lang2=french&word=bon&go=search'} params = {'word':word,'disa':'1','lang2':'french'} response = requests.get(url, params=params, headers=headers) print(response.text)
update
it appears php page checks make sure there appropriate referer header sent request. add one, did above (edited original).
Comments
Post a Comment