excel - How to pull listed data from several web pages - Cannot locate table tag -
first of have read alot of different answers online in relation topic have admit struggling adapt them need please appreciated!
i need extract data listed on following webpage (pages 1-7) i.e. fund name, price, currency etc https://toolkit.financialexpress.net/santanderam , pull data through excel.
i have below code open ie page (which working):
' return document containg dom of page strwebaddress ' returns nothing if timeout lngtimeoutinseconds reached public function getiedocument(byval strwebaddress string, optional byval lngtimeoutinseconds long = 15) mshtml.htmldocument dim ie shdocvw.internetexplorer dim iedocument mshtml.htmldocument dim datenow date ' create ie application, representing tab set ie = new shdocvw.internetexplorer ' optionally make application visible, though work fine in background otherwise ie.visible = true ' open webpage in tab represented ie , wait until main request finished ' times out after lngtimeoutinseconds warning ie.navigate strwebaddress datenow = while ie.busy if > dateadd("s", lngtimeoutinseconds, datenow) exit function loop ' retrieve webpage's content (that is, html dom) , wait until loaded (images, etc.) ' times out after lngtimeoutinseconds warning set iedocument = ie.document datenow = while iedocument.readystate <> "complete" if > dateadd("s", lngtimeoutinseconds, datenow) exit function loop set getiedocument = iedocument end function however cannot find table tag contains other tags interested allow rest of code pull through data, below code have far:
public sub getteamdata() dim strwebaddress string dim strh2anchorcontent string dim iedocument mshtml.htmldocument dim objh2 mshtml.htmlheaderelement dim objtable mshtml.htmltable dim objrow mshtml.htmltablerow dim objcell mshtml.htmltablecell dim lngrow long dim lngcolumn long ' initialize variables should better passed paramaters or defined constants strwebaddress = "https://toolkit.financialexpress.net/santanderam" strh2anchorcontent = " " ' open page set iedocument = getiedocument(strwebaddress) if iedocument nothing msgbox "timeout reached opening address:" & vbnewline & strwebaddress, vbcritical exit sub end if ' retrieve anchor element each objh2 in iedocument.getelementsbytagname("h2") if objh2.innertext = strh2anchorcontent exit next objh2 if objh2 nothing msgbox "could not find """ & strh2anchorcontent & """ in dom!", vbcritical exit sub end if ' traverse html tree desired table element ' * move 1 element in hierarchy ' * skip 2 elements proceed third (interjected each time whitespace interpreted element of own) ' * move down 2 elements n hierarchy set objtable = objh2.parentelement _ .nextsibling.nextsibling _ .nextsibling.nextsibling _ .nextsibling.nextsibling _ .children(0) _ .children(0) ' iterate on table , output contents lngrow = 1 each objrow in objtable.rows lngcolumn = 1 each objcell in objrow.cells cells(lngrow, lngcolumn) = objcell.innertext lngcolumn = lngcolumn + 1 next objcell lngrow = lngrow + 1 next end sub i assuming if can locate correct table tag enter in line below:
strh2anchorcontent = " " then above work? if can finding correct tag or advise going wrong above?
again appreciated!
thanks
edit 1
updated code:
' open webpage in tab represented ie , wait until main request finished ' times out after lngtimeoutinseconds warning ie.navigate strwebaddress datenow = while ie.busy if > dateadd("s", lngtimeoutinseconds, datenow) exit function loop ' retrieve webpage's content (that is, html dom) , wait until loaded (images, etc.) ' times out after lngtimeoutinseconds warning set iedocument = ie.document datenow = while iedocument.readystate <> "complete" if > dateadd("s", lngtimeoutinseconds, datenow) exit function loop set getiedocument = iedocument end function public sub getteamdata() dim strwebaddress string dim strh2anchorcontent string dim iedocument mshtml.htmldocument dim objh2 mshtml.htmlheaderelement dim obtable mshtml.htmltable dim objrow mshtml.htmltablerow dim objcell mshtml.htmltablecell dim lngrow long dim lngcolumn long ' initialize variables should better passed paramaters or defined constants strwebaddress = "https://toolkit.financialexpress.net/santanderam" ' open page set iedocument = getiedocument(strwebaddress) if iedocument nothing msgbox "timeout reached opening address:" & vbnewline & strwebaddress, vbcritical exit sub end if ' retrieve anchor element set otable = iedocument.getelementbyid("price_1_1") debug.print otable.innertext ' iterate on table , output contents lngrow = 1 each objrow in otable.rows lngcolumn = 1 each objcell in objrow.cells cells(lngrow, lngcolumn) = objcell.innertext lngcolumn = lngcolumn + 1 next objcell lngrow = lngrow + 1 next end sub
your code working fine, problem trying capture data table before loaded. added simple wait loop 5 seconds , current code captured data. below loop added before set otable = iedocument.getelementbyid("price_1_1") statement:
datenow = bexitloop = false lngtimeoutinseconds = 5 while not bexitloop if > dateadd("s", lngtimeoutinseconds, datenow) exit loop code above static 5 second wait. make more dynamic.. i'll leave there brain teaser :)
Comments
Post a Comment