pyspider安装

摘要:
#重启mysqlservicemysqldrestart#mysql-uroot#修改root密码mysql˃usemsyqlmysql˃updateusersetpassword=passwordwhereuser='root';#创建数据库并授权mysql˃createdatabasetaskdb;mysql˃createdatabaseprojectdb;mysql˃createdatabaseresultdb;mysql˃createuser'pyspider'@'%';mysql˃createuserpyspider@'localhost'identifiedby'pyspider-pass';mysql˃grantselect,insert,update,references,delete,create,drop,alter,index,trigger,createview,showview,execute,alterroutine,createroutine,createtemporarytables,locktables,eventontaskdb.*to'pyspider'@'%';mysql˃grantselect,insert,update,references,delete,create,drop,alter,index,trigger,createview,showview,execute,alterroutine,createroutine,createtemporarytables,locktables,eventonprojectdb.*to'pyspider'@'%';mysql˃grantselect,insert,update,references,delete,create,drop,alter,index,trigger,createview,showview,execute,alterroutine,createroutine,createtemporarytables,locktables,eventonresultdb.*to'pyspider'@'%';mysql˃flushprivileges;修改配置文件vi/etc/my.cnfbind-address=0.0.0.0#重启数据库servicemysqldrestart安装redis下载redis,并解压到/root/training目录下安装rediscd/root/training/redis-2.8.12makemaketestmakeinstall#为集群做准备cd/root/training/redis-3.2.8cpredis.conf/etc/vi/etc/redis.confbind0.0.0.0#启动redisredis-server/etc/redis.conf&启动成功标志:Theserverisnowreadytoacceptconnectionsonport6379防火墙查看防火墙状态:firewall-cmd--state自己两条配置:iptables-AINPUT-s127.0.0.1-ptcp--dport6379-jACCEPTiptables-AINPUT-ptcp--dport6379-jDROP关闭firewall:systemctlstopfirewalld.service#停止firewallsystemctldisablefirewalld.service#禁止firewall开机启动如果不会配置,最好停止防火墙。

操作系统

CentOS Linux release 7.0.1406 (Core)

Python环境

Python安装  

安装依赖:
yum install gcc # 安装python必须
yum install zlib # 以下四个安装setuptools必须,如果安装在python后,则需要重新make python
yum install zlib-devel
yum install openssl
yum install openssl-devel

cd Python-2.7.13
./configure --prefix=/python2.7
make
make install

配置环境变量
# vi ~/.bash_profile
export PATH=/python2.7/bin:$PATH

安装pip

依赖:setuptools

依赖:six-1.10.0.tar.gz packaging-16.8.tar.gz pyparsing-2.2.0.tar.gz appdirs-1.4.3.tar.gz

cd pip-9.0.1

# python setup.py install

安装pyspider

从github下载最新版pyspider

依赖系统包:

tcl protobuf libcurl-devellibxslt-devel libxml2

使用yum install 安装他们。。。

cd pyspider
# 安装依赖包并安装
pip install -r requirements.txt
python setup.py install

由于requirements.txt中的mysql-connector无法下载,所以选择安装其它版本的mysql-connector

pip install mysql-connector==2.1.4

安装mysql数据库

用yum安装完后,参考http://www.itnose.net/detail/6310643.html,完成数据库的安装。

# 重启mysql
service mysqld restart

# mysql -u root
# 修改root密码
mysql> use msyql
mysql> update user set password=password('123456') where user='root';


# 创建数据库并授权
mysql> create database taskdb;
mysql> create database projectdb;
mysql> create database resultdb;
mysql> create user 'pyspider'@'%';
mysql> create user pyspider@'localhost' identified by 'pyspider-pass';
mysql> grant select,insert,update,references,delete,create,drop,alter,index,trigger,create view,show view,execute,alter routine,create routine,create temporary tables,lock tables,event on taskdb.* to 'pyspider'@'%';
mysql> grant select,insert,update,references,delete,create,drop,alter,index,trigger,create view,show view,execute,alter routine,create routine,create temporary tables,lock tables,event on projectdb.* to 'pyspider'@'%';
mysql> grant select,insert,update,references,delete,create,drop,alter,index,trigger,create view,show view,execute,alter routine,create routine,create temporary tables,lock tables,event on resultdb.* to 'pyspider'@'%';
mysql> flush privileges;

修改配置文件(为集群做准备)
vi /etc/my.cnf
bind-address = 0.0.0.0

# 重启数据库 
service mysqld restart

安装redis

下载redis,并解压到/root/training目录下

安装redis

cd /root/training/redis-2.8.12
make
make test
make install

# 为集群做准备
cd /root/training/redis-3.2.8
cp redis.conf /etc/

vi /etc/redis.conf
bind 0.0.0.0 

# 启动 redis 
redis-server /etc/redis.conf &

启动成功标志:The server is now ready to accept connections on port 6379

防火墙

查看防火墙状态:

firewall-cmd --state

自己两条配置:

iptables -A INPUT -s 127.0.0.1 -p tcp --dport 6379 -j ACCEPT
iptables -A INPUT -p tcp --dport 6379 -j DROP

关闭firewall:
systemctl stop firewalld.service #停止firewall
systemctl disable firewalld.service #禁止firewall开机启动

如果不会配置,最好停止防火墙。

安装phantomjs

下载:wget https://bbuseruploads.s3.amazonaws.com/fd96ed93-2b32-46a7-9d2b-ecbc0988516a/downloads/396e7977-71fd-4592-8723-495ca4cfa7cc/phantomjs-2.1.1-linux-x86_64.tar.bz2?Signature=guF7TAUW11qr9nZXcTBHu7dg1ds%3D&Expires=1488510600&AWSAccessKeyId=AKIAIVFPT2YJYYZY3H4A&versionId=null&response-content-disposition=attachment%3B%20filename%3D%22phantomjs-2.1.1-linux-x86_64.tar.bz2%22

下载phantomjs-2.1.1-linux-x86_64.tar.bz2到/root目录下,解压

将 phantomjs/bin目录下的phantomjs文件拷贝到/python2.7/bin目录下

配置文件

====================================================================

pyspider配置文件如下:

{
  "taskdb": "mysql+taskdb://pyspider:pyspider-pass@localhost:3306/taskdb",
  "projectdb": "mysql+projectdb://pyspider:pyspider-pass@localhost:3306/projectdb",
  "resultdb": "mysql+resultdb://pyspider:pyspider-pass@localhost:3306/resultdb",
  "message_queue": "redis://localhost:6379/db",
  "webui": {
    "port":5555,
    "username": "pyspider",
    "password": "pyspider-pass",
    "need-auth": true
  }
}

=========================================

# 为安全起见,我们新建一个普通用户来存储配置文件
useradd -md /pyspider pyspider
# 保存配置文件
/pyspider/config.json
# 权限设置
chown -R pyspider:pyspider /pyspider
chmod 400 config.json

启动pyspider

启动pyspider

/anaconda2/bin/pyspider -c /pyspider/config.json

结果如下:

# pyspider -c /pyspider/config.json 
[W 170516 17:45:05 __init__:54] redis DB must zero-based numeric index, using 0 instead
[I 170516 17:45:05 result_worker:49] result_worker starting...
[W 170516 17:45:06 __init__:54] redis DB must zero-based numeric index, using 0 instead
[W 170516 17:45:06 __init__:54] redis DB must zero-based numeric index, using 0 instead
[W 170516 17:45:06 __init__:54] redis DB must zero-based numeric index, using 0 instead
[W 170516 17:45:06 __init__:54] redis DB must zero-based numeric index, using 0 instead
[I 170516 17:45:06 processor:211] processor starting...
[W 170516 17:45:07 __init__:54] redis DB must zero-based numeric index, using 0 instead
[W 170516 17:45:07 __init__:54] redis DB must zero-based numeric index, using 0 instead
[I 170516 17:45:07 tornado_fetcher:638] fetcher starting...
[W 170516 17:45:07 __init__:54] redis DB must zero-based numeric index, using 0 instead
[W 170516 17:45:07 __init__:54] redis DB must zero-based numeric index, using 0 instead
[W 170516 17:45:07 __init__:54] redis DB must zero-based numeric index, using 0 instead
[I 170516 17:45:09 scheduler:782] scheduler.xmlrpc listening on 127.0.0.1:23333
[I 170516 17:45:09 scheduler:647] scheduler starting...
phantomjs fetcher running on port 25555
[I 170516 17:45:09 scheduler:586] in 5m: new:0,success:0,retry:0,failed:0
[W 170516 17:45:10 __init__:54] redis DB must zero-based numeric index, using 0 instead
[W 170516 17:45:10 __init__:54] redis DB must zero-based numeric index, using 0 instead
[W 170516 17:45:10 __init__:54] redis DB must zero-based numeric index, using 0 instead
[W 170516 17:45:10 __init__:54] redis DB must zero-based numeric index, using 0 instead
[W 170516 17:45:10 __init__:54] redis DB must zero-based numeric index, using 0 instead
[I 170516 17:45:10 app:76] webui running on 0.0.0.0:5555

///目前这块还有问题

安装supervisor,监控所有进程

supervisor用来监控pyspider进程,如果停止则立即启动,下载supervisor-3.3.1到/root目录下,并解压。

cd /root/supervisor-3.3.1
python setup.py install

pip install supervisor

创建默认的配置文件并设置

# /python2.7/bin/echo_supervisord_conf > /python2.7/conf/supervisor.conf

; Sample supervisor config file.
;
; For more information on the config file, please see:
; http://supervisord.org/configuration.html
;
; Notes:
;  - Shell expansion ("~" or "$HOME") is not supported.  Environment
;    variables can be expanded using this syntax: "%(ENV_HOME)s".
;  - Comments must have a leading space: "a=b ;comment" not "a=b;comment".

[unix_http_server]
file=/tmp/supervisor.sock   ; (the path to the socket file)
chmod=0700                 ; socket file mode (default 0700)
chown=root:root       ; socket file uid:gid owner
;username=user              ; (default is no username (open server))
;password=123               ; (default is no password (open server))

[inet_http_server]         ; inet (TCP) server disabled by default
port=127.0.0.1:9001        ; (ip_address:port specifier, *:port for all iface)
username=supervisor             ; (default is no username (open server))
password=123               ; (default is no password (open server))

[supervisord]
logfile=/tmp/supervisord.log ; (main log file;default $CWD/supervisord.log)
logfile_maxbytes=50MB        ; (max main logfile bytes b4 rotation;default 50MB)
logfile_backups=10           ; (num of main logfile rotation backups;default 10)
loglevel=info                ; (log level;default info; others: debug,warn,trace)
pidfile=/tmp/supervisord.pid ; (supervisord pidfile;default supervisord.pid)
nodaemon=false               ; (start in foreground if true;default false)
minfds=1024                  ; (min. avail startup file descriptors;default 1024)
minprocs=200                 ; (min. avail process descriptors;default 200)
;umask=022                   ; (process file creation umask;default 022)
;user=chrism                 ; (default is current user, required if root)
;identifier=supervisor       ; (supervisord identifier, default is 'supervisor')
;directory=/tmp              ; (default is not to cd during start)
;nocleanup=true              ; (don't clean up tempfiles at start;default false)
;childlogdir=/tmp            ; ('AUTO' child log dir, default $TEMP)
;environment=KEY="value"     ; (key value pairs to add to environment)
;strip_ansi=false            ; (strip ansi escape codes in logs; def. false)

; the below section must remain in the config file for RPC
; (supervisorctl/web interface) to work, additional interfaces may be
; added by defining them in separate rpcinterface: sections
[rpcinterface:supervisor]
supervisor.rpcinterface_factory = supervisor.rpcinterface:make_main_rpcinterface

[supervisorctl]
serverurl=unix:///tmp/supervisor.sock ; use a unix:// URL  for a unix socket
;serverurl=http://127.0.0.1:9001 ; use an http:// url to specify an inet socket
username=suppervisor             ; should be same as http_username if set
password=123                ; should be same as http_password if set
prompt=mysupervisor         ; cmd line prompt (default "supervisor")
history_file=~/.sc_history  ; use readline history if available

; The below sample program section shows all possible program subsection values,
; create one or more 'real' program: sections to be able to control them under
; supervisor.

;[program:theprogramname]
;command=/bin/cat              ; the program (relative uses PATH, can take args)
;process_name=%(program_name)s ; process_name expr (default %(program_name)s)
;numprocs=1                    ; number of processes copies to start (def 1)
;directory=/tmp                ; directory to cwd to before exec (def no cwd)
;umask=022                     ; umask for process (default None)
;priority=999                  ; the relative start priority (default 999)
;autostart=true                ; start at supervisord start (default: true)
;startsecs=1                   ; # of secs prog must stay up to be running (def. 1)
;startretries=3                ; max # of serial start failures when starting (default 3)
;autorestart=unexpected        ; when to restart if exited after running (def: unexpected)
;exitcodes=0,2                 ; 'expected' exit codes used with autorestart (default 0,2)
;stopsignal=QUIT               ; signal used to kill process (default TERM)
;stopwaitsecs=10               ; max num secs to wait b4 SIGKILL (default 10)
;stopasgroup=false             ; send stop signal to the UNIX process group (default false)
;killasgroup=false             ; SIGKILL the UNIX process group (def false)
;user=chrism                   ; setuid to this UNIX account to run the program
;redirect_stderr=true          ; redirect proc stderr to stdout (default false)
;stdout_logfile=/a/path        ; stdout log path, NONE for none; default AUTO
;stdout_logfile_maxbytes=1MB   ; max # logfile bytes b4 rotation (default 50MB)
;stdout_logfile_backups=10     ; # of stdout logfile backups (default 10)
;stdout_capture_maxbytes=1MB   ; number of bytes in 'capturemode' (default 0)
;stdout_events_enabled=false   ; emit events on stdout writes (default false)
;stderr_logfile=/a/path        ; stderr log path, NONE for none; default AUTO
;stderr_logfile_maxbytes=1MB   ; max # logfile bytes b4 rotation (default 50MB)
;stderr_logfile_backups=10     ; # of stderr logfile backups (default 10)
;stderr_capture_maxbytes=1MB   ; number of bytes in 'capturemode' (default 0)
;stderr_events_enabled=false   ; emit events on stderr writes (default false)
;environment=A="1",B="2"       ; process environment additions (def no adds)
;serverurl=AUTO                ; override serverurl computation (childutils)

; The below sample eventlistener section shows all possible
; eventlistener subsection values, create one or more 'real'
; eventlistener: sections to be able to handle event notifications
; sent by supervisor.

;[eventlistener:theeventlistenername]
;command=/bin/eventlistener    ; the program (relative uses PATH, can take args)
;process_name=%(program_name)s ; process_name expr (default %(program_name)s)
;numprocs=1                    ; number of processes copies to start (def 1)
;events=EVENT                  ; event notif. types to subscribe to (req'd)
;buffer_size=10                ; event buffer queue size (default 10)
;directory=/tmp                ; directory to cwd to before exec (def no cwd)
;umask=022                     ; umask for process (default None)
;priority=-1                   ; the relative start priority (default -1)
;autostart=true                ; start at supervisord start (default: true)
;startsecs=1                   ; # of secs prog must stay up to be running (def. 1)
;startretries=3                ; max # of serial start failures when starting (default 3)
;autorestart=unexpected        ; autorestart if exited after running (def: unexpected)
;exitcodes=0,2                 ; 'expected' exit codes used with autorestart (default 0,2)
;stopsignal=QUIT               ; signal used to kill process (default TERM)
;stopwaitsecs=10               ; max num secs to wait b4 SIGKILL (default 10)
;stopasgroup=false             ; send stop signal to the UNIX process group (default false)
;killasgroup=false             ; SIGKILL the UNIX process group (def false)
;user=chrism                   ; setuid to this UNIX account to run the program
;redirect_stderr=false         ; redirect_stderr=true is not allowed for eventlisteners
;stdout_logfile=/a/path        ; stdout log path, NONE for none; default AUTO
;stdout_logfile_maxbytes=1MB   ; max # logfile bytes b4 rotation (default 50MB)
;stdout_logfile_backups=10     ; # of stdout logfile backups (default 10)
;stdout_events_enabled=false   ; emit events on stdout writes (default false)
;stderr_logfile=/a/path        ; stderr log path, NONE for none; default AUTO
;stderr_logfile_maxbytes=1MB   ; max # logfile bytes b4 rotation (default 50MB)
;stderr_logfile_backups=10     ; # of stderr logfile backups (default 10)
;stderr_events_enabled=false   ; emit events on stderr writes (default false)
;environment=A="1",B="2"       ; process environment additions
;serverurl=AUTO                ; override serverurl computation (childutils)

; The below sample group section shows all possible group values,
; create one or more 'real' group: sections to create "heterogeneous"
; process groups.

;[group:thegroupname]
;programs=progname1,progname2  ; each refers to 'x' in [program:x] definitions
;priority=999                  ; the relative start priority (default 999)

; The [include] section can just contain the "files" setting.  This
; setting can list multiple files (separated by whitespace or
; newlines).  It can also contain wildcards.  The filenames are
; interpreted as relative to this file.  Included files *cannot*
; include files themselves.

;[include]
;files = relative/directory/*.ini
[group:pyspider]
programs=pyspider-fetcher,pyspider-processor

[program:pyspider-fetcher]
command=/python2.7/bin/pyspider -c /pyspider/config.json fetcher
autorestart=true
autostart=true
user=root
group=pyspider
stopasgroup=true

[program:pyspider-processor]
command=/python2.7/bin/pyspider -c /pyspider/config.json processor
autorestart=true
autostart=true
user=root
group=pyspider
stopasgroup=true
stderr_logfile=/var/Spider/Log/Process/spider_process_err.log
stdout_logfile=/var/Spider/Log/Process/spider_process_out.log

启动supervisor

# supervisord -c /etc/supervisor.conf

注:config.json配置修改后需要重载

# supervisorctl reload

目前为止pyspider已安装完成

登陆pyspider

http://ip:5555/

pyspider安装第1张

排错:

ImportError: pycurl: libcurl link-time ssl backend (nss) is different from compile-time ssl backend (none/other)

# pip uninstall pycurl
# export PYCURL_SSL_LIBRARY=nss
# pip install pycurl

ImportError: No module named _sqlite3

# find / -name _sqlite*.so
/usr/lib64/python2.7/lib-dynload/_sqlite3.so
/usr/lib64/python2.7/site-packages/_sqlitecache.so

# cp /usr/lib64/python2.7/lib-dynload/_sqlite3.so /python2.7/lib/python2.7/lib-dynload/

免责声明:文章转载自《pyspider安装》仅用于学习参考。如对内容有疑问,请及时联系本站处理。

上篇codeforces CJava8新特性LocalDateTime获取周几下篇

宿迁高防,2C2G15M,22元/月;香港BGP,2C5G5M,25元/月 雨云优惠码:MjYwNzM=

相关文章

利用strace & Perf分析MySQL

strace介绍及用途 strace是一个用于诊断,分析linux用户态进程的工具 类似的工具pstrace,lsof,gdb,pstrack strace观察mysqld对my.cnf 配置文件的加载顺序 命令如下:strace -T -tt -s 100 -o start.log /usr/local/mysql/bin/mysqld # cat -n...

php编程 之 php进阶练习

1,php的date相关操作: PHP date() 函数可把时间戳格式化为可读性更好的日期和时间。 <?php echo date("Y/m/d") . "<br>"; //返回2016/10/21 echo date("Y.m.d") . "<br>"; //返回2016.10.21 echo date("Y-m-...

Linux+Apache+MySQL+PHP5的安装与配置与phpBB2论坛的架设

在现在的网络应用中,Linux+Apache+MySQL+PHP已经成为一个重要的组合应用了.在这里我们以PHP5为例谈一下Linux+Apache+MySQL+PHP5的安装与配置.在经过这样的工作以后我们就可以用phpBB2来架设我们自己的论坛了.1 安装MySQLMySQL可以从htt://www.mysql.org处下载得到.解压后入其目录,我们可...

C#远程访问linux(ubuntu)或windows的mysql数据库

 1、远程访问数据库大概模型 2、mysql在win7、linux上如何设置:2.1、分配权限(linux和win7) 进行mysql命令行,进行分配权限、执行 GRANTALLPRIVILEGESON*.*TO'Lucy'@'192.168.1.102' IDENTIFIED BY'123'WITHGRANTOPTION; ALL PRIVILEGE...

mysql 存储过程权限相关

1.修改mysql 存储过程的definer修改mysql.proc表 的definer字段   update mysql.proc set definer='root@%' where db='servant_591up'; UPDATE `mysql`.`proc` SET `definer`='root00@%' WHERE `db`='test'...

mysql5.7密码修改与报错分析

1、修改密码 修改密码: vim /etc/my.cnf 的mysqld字段加入skip-grant-tables 重启MySQL,service mysqld restart 终端输入 mysql 直接登录MySQL数据库,然后use mysql update mysql.user set authentication_string=password('...