excel - How to pull listed data from several web pages - Cannot locate table tag -


first of have read alot of different answers online in relation topic have admit struggling adapt them need please appreciated!

i need extract data listed on following webpage (pages 1-7) i.e. fund name, price, currency etc https://toolkit.financialexpress.net/santanderam , pull data through excel.

i have below code open ie page (which working):

' return document containg dom of page strwebaddress ' returns nothing if timeout lngtimeoutinseconds reached public function getiedocument(byval strwebaddress string, optional byval lngtimeoutinseconds long = 15) mshtml.htmldocument dim ie shdocvw.internetexplorer dim iedocument mshtml.htmldocument dim datenow date  ' create ie application, representing tab set ie = new shdocvw.internetexplorer  ' optionally make application visible, though work fine in background otherwise ie.visible = true  ' open webpage in tab represented ie , wait until main request finished ' times out after lngtimeoutinseconds warning ie.navigate strwebaddress datenow = while ie.busy     if > dateadd("s", lngtimeoutinseconds, datenow) exit function loop  ' retrieve webpage's content (that is, html dom) , wait until loaded (images, etc.) ' times out after lngtimeoutinseconds warning set iedocument = ie.document datenow = while iedocument.readystate <> "complete"     if > dateadd("s", lngtimeoutinseconds, datenow) exit function loop  set getiedocument = iedocument end function 

however cannot find table tag contains other tags interested allow rest of code pull through data, below code have far:

public sub getteamdata() dim strwebaddress string dim strh2anchorcontent string dim iedocument mshtml.htmldocument dim objh2 mshtml.htmlheaderelement dim objtable mshtml.htmltable dim objrow mshtml.htmltablerow dim objcell mshtml.htmltablecell dim lngrow long dim lngcolumn long  ' initialize variables should better passed paramaters or defined constants strwebaddress = "https://toolkit.financialexpress.net/santanderam" strh2anchorcontent = "   "  ' open page set iedocument = getiedocument(strwebaddress) if iedocument nothing     msgbox "timeout reached opening address:" & vbnewline & strwebaddress, vbcritical     exit sub end if  ' retrieve anchor element each objh2 in iedocument.getelementsbytagname("h2")    if objh2.innertext = strh2anchorcontent exit next objh2 if objh2 nothing     msgbox "could not find """ & strh2anchorcontent & """ in dom!", vbcritical     exit sub end if  ' traverse html tree desired table element ' * move 1 element in hierarchy ' * skip 2 elements proceed third (interjected each time whitespace interpreted element of own) ' * move down 2 elements n hierarchy set objtable = objh2.parentelement _                     .nextsibling.nextsibling _                     .nextsibling.nextsibling _                     .nextsibling.nextsibling _                     .children(0) _                    .children(0)   '   iterate on table , output contents lngrow = 1  each objrow in objtable.rows     lngcolumn = 1      each objcell in objrow.cells          cells(lngrow, lngcolumn) = objcell.innertext         lngcolumn = lngcolumn + 1      next objcell      lngrow = lngrow + 1  next end sub 

i assuming if can locate correct table tag enter in line below:

 strh2anchorcontent = "  " 

then above work? if can finding correct tag or advise going wrong above?

again appreciated!

thanks

edit 1

updated code:

    ' open webpage in tab represented ie , wait until main request finished ' times out after lngtimeoutinseconds warning ie.navigate strwebaddress datenow = while ie.busy     if > dateadd("s", lngtimeoutinseconds, datenow) exit function loop  ' retrieve webpage's content (that is, html dom) , wait until loaded (images, etc.) ' times out after lngtimeoutinseconds warning set iedocument = ie.document datenow = while iedocument.readystate <> "complete"     if > dateadd("s", lngtimeoutinseconds, datenow) exit function loop  set getiedocument = iedocument end function  public sub getteamdata() dim strwebaddress string dim strh2anchorcontent string dim iedocument mshtml.htmldocument dim objh2 mshtml.htmlheaderelement dim obtable mshtml.htmltable dim objrow mshtml.htmltablerow dim objcell mshtml.htmltablecell dim lngrow long dim lngcolumn long  ' initialize variables should better passed paramaters or defined constants strwebaddress = "https://toolkit.financialexpress.net/santanderam"   ' open page set iedocument = getiedocument(strwebaddress) if iedocument nothing     msgbox "timeout reached opening address:" & vbnewline &     strwebaddress, vbcritical     exit sub end if  ' retrieve anchor element set otable = iedocument.getelementbyid("price_1_1") debug.print otable.innertext  ' iterate on table , output contents lngrow = 1 each objrow in otable.rows     lngcolumn = 1     each objcell in objrow.cells         cells(lngrow, lngcolumn) = objcell.innertext         lngcolumn = lngcolumn + 1     next objcell     lngrow = lngrow + 1 next end sub 

your code working fine, problem trying capture data table before loaded. added simple wait loop 5 seconds , current code captured data. below loop added before set otable = iedocument.getelementbyid("price_1_1") statement:

datenow = bexitloop = false lngtimeoutinseconds = 5 while not bexitloop     if > dateadd("s", lngtimeoutinseconds, datenow) exit loop 

code above static 5 second wait. make more dynamic.. i'll leave there brain teaser :)


Comments

Popular posts from this blog

python - Selenium remoteWebDriver (& SauceLabs) Firefox moseMoveTo action exception -

html - How to custom Bootstrap grid height? -

transpose - Maple isnt executing function but prints function term -