python抓取数据之Scrapy框架的使用-JobPlus

首先明确一点scrapy是需要安装的。
安装scrapy >>>pip3 install scrapy
然后用scrapy -h 查看命令的使用方法，此时可以看手册去分清那些命令是需要scrapy项目，那些不需要。
比如，startproject 创建scrapy命令，是不需要有项目的。crawl 执行爬虫，就需要有项目

bogon:~ zhangxiaojing$

scrapy Scrapy 1.5.0 - no active project

Usage:

scrapy <command> [options] [args]

Available commands:

bench Run quick benchmark test

fetch Fetch a URL using the Scrapy downloader

genspider Generate new spider using pre-defined templates

runspider Run a self-contained spider (without creating a project)

settings Get settings values

shell Interactive scraping console

startproject Create new project

version Print Scrapy version

view Open URL in browser, as seen by Scrapy

[ more ] More commands available when run from project directory

全局命令:

startproject

settings

runspider

shell

fetch

viewversion

项目(Project-only)命令:

crawl

check

list

edit

parse

genspider

deploy

bench

二、项目目录

tutorial/

scrapy.cfg

tutorial/ 建立的爬虫目录

__init__.py

items.py

scrapy.Feild()是爬虫需要爬去的字段

pipelines.py 管道：上传图片，保存到数据库方法等

settings.py 和管道配合使用的配置文件，写好的管道在配置中调用

spiders/ 具体的爬虫文件，可以有多少爬虫

__init__.py ...

scrapy使用步骤：
scrapy startproject pachong
cd pachong
tree .
cd spiders
vi pachong.py ⇒ 爬虫文件
cd ../
vi items.py ⇒ 爬虫文件需要爬取到item的字段
vi pipeline.py ⇒ 图片上传，数据入库等方法
vi settings.py ⇒ 数据库连接字段，管道使用，图片上传路径等配置项
scrapy crawl pachong ⇒ 执行爬虫
scrapy crawl –logfile=log.txt pachong ⇒ 执行爬虫，并将输出写入文件

首先明确一点scrapy是需要安装的。  安装scrapy >>>pip3 install scrapy  然后用scrapy -h 查看命令的使用方法，此时可以看手册去分清那些命令是需要scrapy项目，那些不需要。  比如，startproject 创建scrapy命令，是不需要有项目的。crawl 执行爬虫，就需要有项目bogon:~ zhangxiaojing$ scrapy
Scrapy 1.5.0 - no active project   Usage:  scrapy <command> [options] [args] Available commands:  bench         Run quick benchmark test  fetch         Fetch a URL using the Scrapy downloader  genspider     Generate new spider using pre-defined templates  runspider     Run a self-contained spider (without creating a project)  settings      Get settings values  shell         Interactive scraping console  startproject  Create new project  version       Print Scrapy version  view          Open URL in browser, as seen by Scrapy  [ more ]      More commands available when run from project directory<ul><li> </li></ul>全局命令:  startproject settings runspider shell fetch viewversion 项目(Project-only)命令: crawl check list edit parse genspider deploy bench<ul><li> </li></ul>二、项目目录tutorial/    scrapy.cfg    tutorial/               建立的爬虫目录        __init__.py        items.py           scrapy.Feild()是爬虫需要爬去的字段        pipelines.py       管道：上传图片，保存到数据库方法等        settings.py        和管道配合使用的配置文件，写好的管道在配置中调用        spiders/            具体的爬虫文件，可以有多少爬虫            __init__.py            ...<ul><li> </li></ul>scrapy使用步骤：  scrapy startproject pachong  cd pachong  tree .  cd spiders  vi pachong.py ⇒ 爬虫文件  cd ../  vi items.py ⇒ 爬虫文件需要爬取到item的字段  vi pipeline.py ⇒ 图片上传，数据入库等方法  vi settings.py ⇒ 数据库连接字段，管道使用，图片上传路径等配置项  scrapy crawl pachong ⇒ 执行爬虫  scrapy crawl –logfile=log.txt pachong ⇒ 执行爬虫，并将输出写入文件