html - Web scrapping in R -
i web scrap this web site
in particular take information in table: 
please note choose specific date on upper right corner.
by following this guide
i wrote following code
library(rvest) url_nba <- 'https://projects.fivethirtyeight.com/2017-nba-predictions/' webpage_nba <- read_html(url_nba) #using css selectors scrap rankings section data_nba <- html_nodes(webpage_nba,'#standings-table') #converting ranking data text data_nba <- html_text(data_nba) write.csv(data_nba,"web scraping test.csv") from understanding numbers want ( e.g. warriors 94%, 79%, 66%, 59%) "coded" in different way. in other words, written in web scraping test.csv not readable.
is there way can transform "coded numbers" "regular numbers" ?
i tried parse data using rvest, seems challenging problem here click dropdown menu, represented <select> tag in html structure. equipped heavy artillery - rselenium browser emulator. using became easy, answer on so:
library(rselenium) library(rvest) url_nba <- 'https://projects.fivethirtyeight.com/2017-nba-predictions/' #initiate rselenium. if doesn't work, try other browser engines rd <- rsdriver(port=4444l,browser="firefox") remdr <- rd$client #navigate main page remdr$navigate(url_nba) #find box , click option 10 (april 14 before playoffs) webelem <- remdr$findelement(using = 'xpath', value = "//*[@id='forecast-selector']/div[2]/select/option[10]") webelem$clickelement() # save html webpage <- remdr$getpagesource()[[1]] # close rselenium remdr$close() rd[["server"]]$stop() # select 1 of tables , dataframe webpage_nba <- read_html(webpage) %>% html_table(fill = true) df <- webpage_nba[[3]] # clear dataframe names(df) <- df[3,] df <- tail(df,-3) df <- head(df,-4) df <- df[ , -which(names(df) == "na")] df elo carm-elo 1-week change team conf. conf. semis conf. finals finals win title 4 1770 1792 -14 warriors west 94% 79% 66% 59% 5 1661 1660 -43 spurs west 90% 62% 15% 11% 6 1600 1603 +33 raptors east 77% 47% 25% 5% 7 1636 1640 +33 clippers west 58% 11% 7% 5% 8 1587 1589 -22 celtics east 70% 42% 24% 4% 9 1587 1584 -9 wizards east 79% 38% 21% 4% 10 1617 1609 +16 jazz west 42% 7% 5% 3% 11 1602 1606 -18 rockets west 70% 27% 5% 3% 12 1545 1541 -22 cavaliers east 59% 27% 11% 2% 13 1519 1523 +25 bulls east 30% 15% 7% <1% 14 1526 1520 +37 pacers east 41% 17% 6% <1% 15 1563 1564 +6 trail blazers west 6% 3% 1% <1% 16 1543 1537 -20 thunder west 30% 8% <1% <1% 17 1502 1502 -3 bucks east 23% 9% 3% <1% 18 1479 1469 +46 hawks east 21% 6% 2% <1% 19 1482 1480 -41 grizzlies west 10% 3% <1% <1% 20 1569 1555 +32 heat east — — — — 21 1552 1533 +27 nuggets west — — — — 22 1482 1489 -12 pelicans west — — — — 23 1463 1472 -18 timberwolves west — — — — 24 1463 1462 -40 hornets east — — — — 25 1441 1436 +22 pistons east — — — — 26 1420 1421 -20 mavericks west — — — — 27 1393 1395 -2 kings west — — — — 28 1374 1379 -13 knicks east — — — — 29 1367 1370 +47 lakers west — — — — 30 1372 1370 -14 nets east — — — — 31 1352 1355 -9 magic east — — — — 32 1338 1348 -29 76ers east — — — — 33 1340 1337 +26 suns west — — — — if want parse other time periods, check option value in html of page using dev tools of browser.
Comments
Post a Comment