python - Scrapy: ImportError: No module named pipelines -
i'm having issues getting scraper load item pipeline. in attempts try , add custom pipeline getting following error:
importerror: no module named pipelines
i've tried documentation doesn't explain how set item_pipeline
option path. example docs:
item_pipelines = { 'myproject.pipelines.pricepipeline': 300, 'myproject.pipelines.jsonwriterpipeline': 800, }
where myproject
come from?
below directory structure of application:
├── readme.md ├── bot.py ├── data │ └── formax.json ├── pipelines │ ├── formaxpipeline.py │ └── __init__.py ├── praw.ini ├── requirements.txt └── scrapers ├── __init__.py ├── formax.py
in formax.py
class set settings.
custom_settings = { 'item_pipelines': { 'pipelines.formaxpipeline': 100 }, 'user_agent': 'mozilla/5.0 (compatible; googlebot/2.1; +http://www.google.com/bot.html)' }
i run root directory:
scrapy runspider scrapers/formax.py -o data/formax.json
the scraper fails following error:
importerror: no module named pipelines
how can add custom pipelines settings?
there couple of things. first, scrapy expects here have standard scrapy project structure myproject
project name (and name of project's folder).
second, item_pipelines
needs specify classes of pipelines, can see (structure of application , custom_settings
) specified module. instead of pipelines.formaxpipeline
should have pipelines.formaxpipeline.formaxpipeline
in item_pipelines
settings. (here assume class named formaxpipeline
, defined in formaxpipeline.py
file.)
but actual error comes fact scrapy can't locate module. don't know how solve more cleanly, because scrapy doesn't expect (not having project structure), 1 workaround run spider way:
pythonpath="$pythonpath:." scrapy runspider scrapers/formax.py -o data/formax.json
i.e. tell python code.
Comments
Post a Comment