介绍下3种获取cookie的方法。
(1)借助handler
这种方法也是网上介绍最多的一种方法,但是用起来比较麻
fromhttpimportcookiejar
from urllib importrequest classCraw(): def __init__(self): self.url = '' self.headers['User-Agent'] = 'Mozilla/5.0 (Windows NT 6.3; Win64; x64) ' 'AppleWebKit/537.36 (KHTML, like Gecko) Chrome/65.0.3325.162 Safari/537.36' self.headers['Content-Type'] = 'application/x-www-form-urlencoded' defgetCookies(self): cookie =cookiejar.CookieJar() handler =request.HTTPCookieProcessor(cookie) opener =request.build_opener(handler) response =opener.open(self.url) cookieValue = '' for item incookie: cookieValue += item.name + '=' + item.value + ';' self.headers['Cookie'] =cookieValue response = requests.get(url=self.url) defgetVerificationCode(self):
img_url = '' imgResponse = requests.get(url=img_url,headers =self.headers) #直接使用headers即可 base64_jpg =base64.b64encode(imgResponse.content) return base64_jpg
(2)使用response headers的set_cookie
import requests
import re
classCrawler():
def getCookie(self):
response = requests.post(self.url)
set_cookie = response.headers['Set-Cookie']
array = re.split('[;,]',set_cookie)
cookieValue = ''
for arr in array:
if arr.find('DZSW_SESSIONID') >= 0 or arr.find('bl0gm1HBTB') >= 0:
cookieValue += arr + ';'
(3)使用response的cookies属性获取
只写getCookies方法,代码如下:
importrequests classCrawler(): defgetCookie(self): response =requests.get(self.url) cookie_value = '' for key,value inresponse.cookies.items(): cookie_value += key + '=' + value + ';' self.headers['Cookie'] = cookie_value
另外还有个疑问,就是response.cookies返回的是RequestsCookieJar对象,看到书上有这样的代码
importrequests classCrawl(): def __init__(self): self.url = '' self.headers ={} self.cookieJar =requests.cookies.RequestsCookieJar() defgetCookie() cookies = 'a=a1;b=b1' #字符串 for cookie in cookies.split(';'): key,value = cookie.split('=,1) self.cookieJar.set(key,value) r = requests.get(self.url,cookies = self.cookieJar,headers = self.headers)
为什么不可以直接使用response.cookies赋值给最后请求的cookies。已尝试,结果报错,最后还是使用的方法2。
希望熟悉的大神看到可以给予解答。