scrapy - Graph of Extracted Links -
can tell me if possible analytics on links extracted crawler? know there analytics api can't quite figure out how use , docs pretty scant.
i'm trying troubleshoot why crawler extracting links not others. example, start crawl on home page in there links urls containing word business
following rule not return items.
rules = ( rule(linkextractor(allow=('business', )), callback='parse_item', follow=true), )
it great if there way log sort of graph of extracted links cannot find way implement this.
i think easier way test rule test linkextractor
obj using scrapy shell
, assuming you're talking crawlspider
think there's no built-in way of doing that. nonetheless, if want generate sort of directed graph subclass linkeextractor
, overwrite extract_links
method print "graph edges" like:
logger = logging.getlogger('verboselinkextractor') class verboselinkextractor(linkextractor): def extract_links(self, response): links = super(graph, self).extract_links(response) link in links: logger.debug("{} ==> {}".format(response.url, link.url)) # or simple print return links
Comments
Post a Comment