r - Follow links in a loop with rvest -
i'm trying learn rvest package, documentation , examples on web either basic or complex. not find how use follow_link function in loop browse number of pages. perhaps did not understand logic @ all...
here simplified example of attempt:
library(rvest) url <- "https://www.wikidata.org/w/index.php?title=special:whatlinkshere/q5&limit=500" s <- html_session(url) liste <- list() (i in 1:2) { data <- s %>% read_html() %>% html_nodes("#mw-whatlinkshere-list li") result <- c(liste, data) s <- s %>% follow_link(xpath = "//a[text()='next 500']/@href") } i've tried avoid jump_link, : it's better, i'm not sure best , fastest solution :
liste <- c() while (!is.na(url)) { data <- url %>% read_html() %>% html_nodes("#mw-whatlinkshere-list li") liste <- c(liste, data) url <- url %>% read_html() %>% html_node(xpath = "//a[text()='next 500']") %>% html_attr("href") %>% paste0("https://www.wikidata.org", .) print(url) } any advice welcome , appreciated.
Comments
Post a Comment