python - unable to login with scrapy -


i'm trying scrape page have login first, reason, scrapy crawl page have nothing do, after use formrequest. see code below:

# coding: utf-8 import scrapy scrapy.http import request, formrequest  usuario = 'myemail' senha = 'mypassword' urllogin = 'https://ludopedia.com.br/login' urlnotificacoes = 'https://ludopedia.com.br/notificacoes'  class notificacao(scrapy.item):     """contem os dados dos anuncios da ludopedia"""     jogo = scrapy.field()     colecao = scrapy.field()     tipo = scrapy.field()     link = scrapy.field()   class loginspider(scrapy.spider):     name = 'ludopedia'      custom_settings = {         'concurrent_requests': 1,         'log_level': 'debug',     }     start_urls = [ urllogin ]      def parse(self, response):         return formrequest.from_response(             response,             formname='form',             formid='form',             formdata={'email': usuario, 'pass': senha},             callback=self.after_login,             dont_filter=true             )      def after_login(self, response):         # check login succeed before going on         if "minha conta" in response.body:             self.logger.error("login falhou")             return           yield request(urlnotificacoes)          self.logger.info("visitei %s", response.url)         msg = response.selector.xpath ('//*[@id="page-content"]/div/div/table/tbody/tr[2]/td/a/div[2]/div')         ... 

the output of script is:

2017-07-25 12:02:55 [scrapy.utils.log] info: scrapy 1.4.0 started (bot: scrapybot) 2017-07-25 12:02:55 [scrapy.utils.log] info: overridden settings: {'spider_loader_warn_only': true} 2017-07-25 12:02:56 [scrapy.middleware] info: enabled extensions: ['scrapy.extensions.memusage.memoryusage',  'scrapy.extensions.logstats.logstats',  'scrapy.extensions.telnet.telnetconsole',  'scrapy.extensions.corestats.corestats'] 2017-07-25 12:02:56 [scrapy.middleware] info: enabled downloader middlewares: ['scrapy.downloadermiddlewares.httpauth.httpauthmiddleware',  'scrapy.downloadermiddlewares.downloadtimeout.downloadtimeoutmiddleware',  'scrapy.downloadermiddlewares.defaultheaders.defaultheadersmiddleware',  'scrapy.downloadermiddlewares.useragent.useragentmiddleware',  'scrapy.downloadermiddlewares.retry.retrymiddleware',  'scrapy.downloadermiddlewares.redirect.metarefreshmiddleware',  'scrapy.downloadermiddlewares.httpcompression.httpcompressionmiddleware',  'scrapy.downloadermiddlewares.redirect.redirectmiddleware',  'scrapy.downloadermiddlewares.cookies.cookiesmiddleware',  'scrapy.downloadermiddlewares.httpproxy.httpproxymiddleware',  'scrapy.downloadermiddlewares.stats.downloaderstats'] 2017-07-25 12:02:56 [scrapy.middleware] info: enabled spider middlewares: ['scrapy.spidermiddlewares.httperror.httperrormiddleware',  'scrapy.spidermiddlewares.offsite.offsitemiddleware',  'scrapy.spidermiddlewares.referer.referermiddleware',  'scrapy.spidermiddlewares.urllength.urllengthmiddleware',  'scrapy.spidermiddlewares.depth.depthmiddleware'] 2017-07-25 12:02:56 [scrapy.middleware] info: enabled item pipelines: [] 2017-07-25 12:02:56 [scrapy.core.engine] info: spider opened 2017-07-25 12:02:56 [scrapy.extensions.logstats] info: crawled 0 pages (at 0 pages/min), scraped 0 items (at 0 items/min) 2017-07-25 12:02:56 [scrapy.extensions.telnet] debug: telnet console listening on 127.0.0.1:6024 2017-07-25 12:02:58 [scrapy.core.engine] debug: crawled (200) <get https://ludopedia.com.br/login> (referer: none) 2017-07-25 12:02:59 [scrapy.core.engine] debug: crawled (200) <post https://ludopedia.com.br/login> (referer: https://ludopedia.com.br/login) 2017-07-25 12:02:59 [ludopedia] info: visitei https://ludopedia.com.br/login <200 https://ludopedia.com.br/login> 2017-07-25 12:03:00 [scrapy.core.engine] debug: crawled (200) <get https://ludopedia.com.br/notificacoes> (referer: https://ludopedia.com.br/login) 2017-07-25 12:03:01 [scrapy.core.engine] debug: crawled (200) <get https://ludopedia.com.br/search?search=&email=myemail&pass=mypassword> (referer: https://ludopedia.com.br/notificacoes) 2017-07-25 12:03:01 [scrapy.dupefilters] debug: filtered duplicate request: <get https://ludopedia.com.br/notificacoes> - no more duplicates shown (see dupefilter_debug show duplicates) 2017-07-25 12:03:01 [ludopedia] info: visitei https://ludopedia.com.br/search?search=&email=myemail&pass=mypassword <200 https://ludopedia.com.br/search?search=&email=myemail&pass=mypassword> 2017-07-25 12:03:01 [scrapy.core.engine] info: closing spider (finished) 2017-07-25 12:03:01 [scrapy.statscollectors] info: dumping scrapy stats: {'downloader/request_bytes': 1357,  'downloader/request_count': 4,  'downloader/request_method_count/get': 3,  'downloader/request_method_count/post': 1,  'downloader/response_bytes': 134813,  'downloader/response_count': 4,  'downloader/response_status_count/200': 4,  'dupefilter/filtered': 1,  'finish_reason': 'finished',  'finish_time': datetime.datetime(2017, 7, 25, 15, 3, 1, 355077),  'log_count/debug': 6,  'log_count/info': 9,  'memusage/max': 51732480,  'memusage/startup': 51732480,  'request_depth_max': 4,  'response_received_count': 4,  'scheduler/dequeued': 4,  'scheduler/dequeued/memory': 4,  'scheduler/enqueued': 4,  'scheduler/enqueued/memory': 4,  'start_time': datetime.datetime(2017, 7, 25, 15, 2, 56, 35121)} 2017-07-25 12:03:01 [scrapy.core.engine] info: spider closed (finished) 

so, problem that, reason, i'm getting redirected ludopedia.com.br/search?search=&email=myemail&pass=mypassword don't know why.

what i'm trying is, visit ludopedia.com.br/login, fill forms e-mail , password, visit ludopedia.com.br/notificacoes , parse html there.

how can avoid link ludopedia.com.br/search?search=&email=myemail&pass=mypassword ?

i've made it! think logic problem, here working code:

# coding: utf-8 import scrapy scrapy.http import request, formrequest  usuario = 'myemail' senha = 'mypassword' urllogin = 'https://ludopedia.com.br/login' urlnotificacoes = 'https://ludopedia.com.br/notificacoes'  class notificacao(scrapy.item):     """contem os dados dos anuncios da ludopedia"""     jogo = scrapy.field()     colecao = scrapy.field()     tipo = scrapy.field()     link = scrapy.field()   class loginspider(scrapy.spider):     name = 'ludopedia'      custom_settings = {         'concurrent_requests': 1,         'log_level': 'debug',     }     start_urls = [ urllogin ]      def parse(self, response):         return formrequest.from_response(             response,             formname='form',             formid='form',             formdata={'email': usuario, 'pass': senha},             callback=self.after_login,             dont_filter=true             )      def after_login(self, response):         # check login succeed before going on         if "minha conta" in response.body:             self.logger.error("login falhou")             return          request = request(urlnotificacoes, callback=self.parse_notificacoes)         yield request      def parse_notificacoes(self, response):         msg = response.selector.xpath ('//*[@id="page-content"]/div/div/table/tbody/tr[2]/td/a/div[2]/div')         ... 

the difference here added, on "after_login" request page want scrape, i've used callback, function parse new response, added "yield request" , parse page (now loged in) whit function "parse_notificacoes".


Comments

Popular posts from this blog

node.js - Node js - Trying to send POST request, but it is not loading javascript content -

javascript - Replicate keyboard event with html button -

javascript - Web audio api 5.1 surround example not working in firefox -