html - Web scrapping in R -


i web scrap this web site

in particular take information in table: enter image description here

please note choose specific date on upper right corner.

by following this guide

i wrote following code

library(rvest) url_nba <- 'https://projects.fivethirtyeight.com/2017-nba-predictions/'  webpage_nba <- read_html(url_nba)  #using css selectors scrap rankings section data_nba <- html_nodes(webpage_nba,'#standings-table')  #converting ranking data text data_nba <- html_text(data_nba) write.csv(data_nba,"web scraping test.csv") 

from understanding numbers want ( e.g. warriors 94%, 79%, 66%, 59%) "coded" in different way. in other words, written in web scraping test.csv not readable.

is there way can transform "coded numbers" "regular numbers" ?

i tried parse data using rvest, seems challenging problem here click dropdown menu, represented <select> tag in html structure. equipped heavy artillery - rselenium browser emulator. using became easy, answer on so:

library(rselenium) library(rvest)  url_nba <- 'https://projects.fivethirtyeight.com/2017-nba-predictions/'   #initiate rselenium. if doesn't work, try other browser engines rd <- rsdriver(port=4444l,browser="firefox") remdr <- rd$client  #navigate main page remdr$navigate(url_nba)  #find box , click option 10 (april 14 before playoffs) webelem <- remdr$findelement(using = 'xpath', value = "//*[@id='forecast-selector']/div[2]/select/option[10]") webelem$clickelement()  # save html webpage <- remdr$getpagesource()[[1]] # close rselenium remdr$close() rd[["server"]]$stop()   # select 1 of tables , dataframe webpage_nba <- read_html(webpage) %>% html_table(fill = true) df <- webpage_nba[[3]]  # clear dataframe names(df) <- df[3,] df <- tail(df,-3) df <- head(df,-4) df <- df[ , -which(names(df) == "na")]  df      elo carm-elo 1-week change          team conf. conf. semis conf. finals finals win title 4  1770     1792           -14      warriors  west         94%          79%    66%       59% 5  1661     1660           -43         spurs  west         90%          62%    15%       11% 6  1600     1603           +33       raptors  east         77%          47%    25%        5% 7  1636     1640           +33      clippers  west         58%          11%     7%        5% 8  1587     1589           -22       celtics  east         70%          42%    24%        4% 9  1587     1584            -9       wizards  east         79%          38%    21%        4% 10 1617     1609           +16          jazz  west         42%           7%     5%        3% 11 1602     1606           -18       rockets  west         70%          27%     5%        3% 12 1545     1541           -22     cavaliers  east         59%          27%    11%        2% 13 1519     1523           +25         bulls  east         30%          15%     7%       <1% 14 1526     1520           +37        pacers  east         41%          17%     6%       <1% 15 1563     1564            +6 trail blazers  west          6%           3%     1%       <1% 16 1543     1537           -20       thunder  west         30%           8%    <1%       <1% 17 1502     1502            -3         bucks  east         23%           9%     3%       <1% 18 1479     1469           +46         hawks  east         21%           6%     2%       <1% 19 1482     1480           -41     grizzlies  west         10%           3%    <1%       <1% 20 1569     1555           +32          heat  east           —            —      —         — 21 1552     1533           +27       nuggets  west           —            —      —         — 22 1482     1489           -12      pelicans  west           —            —      —         — 23 1463     1472           -18  timberwolves  west           —            —      —         — 24 1463     1462           -40       hornets  east           —            —      —         — 25 1441     1436           +22       pistons  east           —            —      —         — 26 1420     1421           -20     mavericks  west           —            —      —         — 27 1393     1395            -2         kings  west           —            —      —         — 28 1374     1379           -13        knicks  east           —            —      —         — 29 1367     1370           +47        lakers  west           —            —      —         — 30 1372     1370           -14          nets  east           —            —      —         — 31 1352     1355            -9         magic  east           —            —      —         — 32 1338     1348           -29         76ers  east           —            —      —         — 33 1340     1337           +26          suns  west           —            —      —         — 

if want parse other time periods, check option value in html of page using dev tools of browser.


Comments

Popular posts from this blog

python - Selenium remoteWebDriver (& SauceLabs) Firefox moseMoveTo action exception -

html - How to custom Bootstrap grid height? -

transpose - Maple isnt executing function but prints function term -