Scraping javascript website in R -
i want scrape match time , date url:
http://www.scoreboard.com/game/rosol-l-goffin-d-2014/8drhx07d/#game-summary
by using chrome dev tools, can see appears generated using following code:
<td colspan="3" id="utime" class="mstat-date">01:20 am, october 29, 2014</td>
but not in source html.
i think because java (correct me if im wrong). how can scrape information using r?
so, rselenium not answer (anymore). if can install phantomjs binary (grab phantomjs binaries here: http://phantomjs.org/) can use render html , scrape rvest
(similar rselenium approach doesn't require java):
library(rvest) # render html site phantomjs url <- "http://www.scoreboard.com/game/rosol-l-goffin-d-2014/8drhx07d/#game-summary" writelines(sprintf("var page = require('webpage').create(); page.open('%s', function () { console.log(page.content); //page source phantom.exit(); });", url), con="scrape.js") system("phantomjs scrape.js > scrape.html") # extract content need pg <- html("scrape.html") pg %>% html_nodes("#utime") %>% html_text() ## [1] "10:20 am, october 28, 2014"
Comments
Post a Comment