6.requests编写企查查爬虫

摘要:
(为了完成编译,可以获取数据)企业检查的代码数据如下:1#编码:utf-82importRequests3fromllimportree4importarandom5importRe6#目标获取地址7base_url1='http://m.qichacha.com'8base_ url='https://m.qichacha.com/search?key='910user_代理=[1

(为编写完善能拿下来数据)

企查查代码数据如下:

 1 #encoding:utf-8
 2 import requests
 3 from lxml import etree
 4 import random
 5 import re
 6 #目标采集地址
 7 base_url1='http://m.qichacha.com'
 8 base_url='https://m.qichacha.com/search?key='
 9 
10 user_agent=[
11     "Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko",
12     "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/536.11 (KHTML, like Gecko) Chrome/20.0.1132.11 TaoBrowser/2.0 Safari/536.11",
13     "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.1 (KHTML, like Gecko) Chrome/21.0.1180.71 Safari/537.1 LBBROWSER",
14     "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; LBBROWSER) ",
15     "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E; LBBROWSER)",
16     "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; WOW64; Trident/5.0; SLCC2; .NET CLR 2.0.50727; .NET CLR 3.5.30729; .NET CLR 3.0.30729; Media Center PC 6.0; .NET4.0C; .NET4.0E; QQBrowser/7.0.3698.400)",
17     "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; QQDownload 732; .NET4.0C; .NET4.0E) ",
18     "Mozilla/5.0 (Windows NT 5.1) AppleWebKit/535.11 (KHTML, like Gecko) Chrome/17.0.963.84 Safari/535.11 SE 2.X MetaSr 1.0",
19     "Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1; Trident/4.0; SV1; QQDownload 732; .NET4.0C; .NET4.0E; SE 2.X MetaSr 1.0) ",
20     "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Maxthon/4.4.3.4000 Chrome/30.0.1599.101 Safari/537.36",
21     "Mozilla/5.0 (Windows NT 6.1; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/38.0.2125.122 UBrowser/4.0.3214.0 Safari/537.36",
22 ]
23 cookie=[
24 'UM_distinctid=16518280d1e3e5-00788a93df88f7-5b193413-1fa400-16518280d1f949; zg_did=%7B%22did%22%3A%20%2216518280dd3b45-0da4e4ab13f793-5b193413-1fa400-16518280dd51f6%22%7D; acw_tc=7b81f49815356947470295339e1fc37a590000ea6190b3cf75ab42853b; PHPSESSID=4l787fmr2v90mh2khj8n6n64l5; CNZZDATA1254842228=226405416-1535690886-null%7C1535690886; Hm_lvt_3456bee468c83cc63fb5147f119f1075=1535595956,1535618789,1535618810,1535694746; zg_de1d1a35bfa24ce29bbf2c7eb17e6c4f=%7B%22sid%22%3A%201535694746927%2C%22updated%22%3A%201535695379242%2C%22info%22%3A%201535595953976%2C%22superProperty%22%3A%20%22%7B%7D%22%2C%22platform%22%3A%20%22%7B%7D%22%2C%22utm%22%3A%20%22%7B%7D%22%2C%22referrerDomain%22%3A%20%22m.baidu.com%22%2C%22cuid%22%3A%20%227d775544e16a1cc0d0ab63b42b4b8aef%22%7D; Hm_lpvt_3456bee468c83cc63fb5147f119f1075=1535695379',
25 'UM_distinctid=16518280d1e3e5-00788a93df88f7-5b193413-1fa400-16518280d1f949; zg_did=%7B%22did%22%3A%20%2216518280dd3b45-0da4e4ab13f793-5b193413-1fa400-16518280dd51f6%22%7D; acw_tc=7b81f49815356947470295339e1fc37a590000ea6190b3cf75ab42853b; PHPSESSID=4l787fmr2v90mh2khj8n6n64l5; CNZZDATA1254842228=226405416-1535690886-null%7C1535690886; Hm_lvt_3456bee468c83cc63fb5147f119f1075=1535595956,1535618789,1535618810,1535694746; zg_de1d1a35bfa24ce29bbf2c7eb17e6c4f=%7B%22sid%22%3A%201535694746927%2C%22updated%22%3A%201535695791508%2C%22info%22%3A%201535595953976%2C%22superProperty%22%3A%20%22%7B%7D%22%2C%22platform%22%3A%20%22%7B%7D%22%2C%22utm%22%3A%20%22%7B%7D%22%2C%22referrerDomain%22%3A%20%22m.baidu.com%22%2C%22cuid%22%3A%20%227d775544e16a1cc0d0ab63b42b4b8aef%22%7D; Hm_lpvt_3456bee468c83cc63fb5147f119f1075=1535695792',
26 'UM_distinctid=16518280d1e3e5-00788a93df88f7-5b193413-1fa400-16518280d1f949; zg_did=%7B%22did%22%3A%20%2216518280dd3b45-0da4e4ab13f793-5b193413-1fa400-16518280dd51f6%22%7D; acw_tc=7b81f49815356947470295339e1fc37a590000ea6190b3cf75ab42853b; PHPSESSID=4l787fmr2v90mh2khj8n6n64l5; CNZZDATA1254842228=226405416-1535690886-null%7C1535690886; Hm_lvt_3456bee468c83cc63fb5147f119f1075=1535595956,1535618789,1535618810,1535694746; zg_de1d1a35bfa24ce29bbf2c7eb17e6c4f=%7B%22sid%22%3A%201535694746927%2C%22updated%22%3A%201535695924595%2C%22info%22%3A%201535595953976%2C%22superProperty%22%3A%20%22%7B%7D%22%2C%22platform%22%3A%20%22%7B%7D%22%2C%22utm%22%3A%20%22%7B%7D%22%2C%22referrerDomain%22%3A%20%22m.baidu.com%22%2C%22cuid%22%3A%20%227d775544e16a1cc0d0ab63b42b4b8aef%22%7D; Hm_lpvt_3456bee468c83cc63fb5147f119f1075=1535695925',
27 'UM_distinctid=16518280d1e3e5-00788a93df88f7-5b193413-1fa400-16518280d1f949; zg_did=%7B%22did%22%3A%20%2216518280dd3b45-0da4e4ab13f793-5b193413-1fa400-16518280dd51f6%22%7D; acw_tc=7b81f49815356947470295339e1fc37a590000ea6190b3cf75ab42853b; PHPSESSID=4l787fmr2v90mh2khj8n6n64l5; CNZZDATA1254842228=226405416-1535690886-null%7C1535690886; Hm_lvt_3456bee468c83cc63fb5147f119f1075=1535595956,1535618789,1535618810,1535694746; zg_de1d1a35bfa24ce29bbf2c7eb17e6c4f=%7B%22sid%22%3A%201535694746927%2C%22updated%22%3A%201535696003819%2C%22info%22%3A%201535595953976%2C%22superProperty%22%3A%20%22%7B%7D%22%2C%22platform%22%3A%20%22%7B%7D%22%2C%22utm%22%3A%20%22%7B%7D%22%2C%22referrerDomain%22%3A%20%22m.baidu.com%22%2C%22cuid%22%3A%20%227d775544e16a1cc0d0ab63b42b4b8aef%22%7D; Hm_lpvt_3456bee468c83cc63fb5147f119f1075=1535696005'
28 ]
29 # 请求头设置
30 headers={
31     'User-agent': random.choice(user_agent),
32     'cookie': random.choice(cookie)
33 }
34 
35 name_list=['成都创信广告有限公司']
36 
37 for name in name_list:
38     start_url=base_url+str(name)
39     print(start_url)
40     response = requests.get(start_url, headers=headers)
41     _response=response.text
42     # print(_response)
43     # content = etree.HTML(_response)
44     # print(content)
45     #获取筛选信息链接
46     search_url=re.findall('</div> <a href="http://t.zoukankan.com/(.*?)" class="a-decoration"> <div class="list-item"> <div class="list-item-top">',_response)
47     url=base_url1+search_url[0]
48     # print(url)
49     # print('*'*100)
50     response1 = requests.get(url,headers=headers)
51     _response1=response1.text
52     #公司名称
53     company_name=re.findall('<div class="company-name">(.*?)<',_response1)[0]
54     print('公司名称:'+company_name)
55     #法人
56     legal_person=re.findall('<a   href="http://t.zoukankan.com/.*?">(.*?)</a>',_response1)[0]
57     print('法人:'+legal_person)
58     #电话
59     telephone=re.findall('<a href="tel:.*?" class="phone a-decoration">(.*?)</a>',_response1)[0]
60     print('电话:'+telephone)
61     # #地址
62     # address=re.findall('</div> <div class="address">(.*?)</div> </div>',_response1)[0]
63     # print(address)
64     # # print('地址:'+address)
65 
66     # #注册号
67     # registration_number=re.findall('</div><div class="basic-item-right">(.*?)</div>',_response1)
68     # print(registration_number)

执行结果如下图:

6.requests编写企查查爬虫第1张

免责声明:文章转载自《6.requests编写企查查爬虫》仅用于学习参考。如对内容有疑问,请及时联系本站处理。

上篇【转】Ubuntu13.04配置:Vim+Syntastic+Vundle+YouCompleteMe评估机器学习模型的几种方法(验证集的重要性)下篇

宿迁高防,2C2G15M,22元/月;香港BGP,2C5G5M,25元/月 雨云优惠码:MjYwNzM=

相关文章

(转)强制显示、隐藏浏览器的滚动条

强制的滚动条: 问题中的”修复”该bug的技术, 同样可以用于其它目的. 利用CSS, 你可以有效地在Mozilla Firefox和Internet Explorer中显示或者隐藏垂直及水平滚动条. 强制显示滚动条: html { overflow: scroll; } 强制隐藏滚动条: html { overflow: hidden; } 隐藏IE的水...

http 带cookie值的get请求(关联测试)

本文主要包含一下五点。 1、如何get请求如何获取cookie信息。 getCookieStore() 2、如何发送带cookie信息的get 请求。setCookieStore() 3、testng依耐测试,带cookie信息的get请求需要依耐于获取cookie信息的请求。 @Test(dependsOnMethods = {"getTestCooki...

Vue前后端动态权限管理对接方案

前后端动态权限管理对接方案 1.用户登录 Request: { username: ‘adminxx’, password: ‘123456’ } Response: { status: ‘OK’, ok: true, comment: ‘操作成功!’, value: { token: ‘xxxxx’, .... } } 2.用户信息及按钮页面按钮权限获...

urllib的使用

urllib的使用 urllib是python内置的HTTP请求库,包含如下四个模块: request:它是最基本HTTP请求模块,可以用来模拟发送请求,就像在浏览器里输入网址然后按回车一样,只需要传入URL以及额外的参数,就可以模拟实现这个过程了。 error:异常处理模块,如果出现请求错误,可以捕获到这些异常,然后进行重试或其他操作以保证程序不会意外终...

实现带有进度条的多文件上传

应用RESPONSE.WRITE实现带有进度条的多文件上传   前几天,写过一篇随笔“使用RESPONSE.WRITE实现在页面的生命周期中前后台的交互”。说是交互,实际上也主要是在ASP.NET的页面周期中 从后台利用RESPONSE.WRITE向前台即时的推送内容。   该篇随笔算是对上一篇文章的实际应用,利用RESPONSE.WRITE的这个特性实...

asp.net 调用 excel 组件

Asp.net 如何调用 Excel ? 1.引用 Microsoft.Office.Interop.Excel.dll,自动包装成Interop.Microsoft.Office.Interop.Excel.dll 2.代码: ///<summary> ///生成 excel 报表 ///</summary> privatevoi...