Python爬虫基本库request的基本使用-偶像资源网

request

用urllib去处理网页验证和Cookies时，需要写Opener和Handler来处理，很不方便，这里我们学习更为强大的库request

get()

实例：

import requests #导入requests
html = requests.get(\'https://www.csdn.net/\')#使用get方法获取页面信息
print(html.text)#调取text属性查看页面代码

添加参数使用param+字典

import requests  # 导入requests
data = {
    \'jl\': \'765\',
    \'kw\': \'python\',
    \'kt\': \'3\'
}
html = requests.get(\'https://sou.zhaopin.com/\',params=data)  # 添加参数
print(html.text)  # 调取text属性查看页面代码

添加headers使用headers+字典

import requests  # 导入requests
headers = {
    \'User-Agent\': \'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36\'
}
data = {
    \'jl\': \'765\',
    \'kw\': \'python\',
    \'kt\': \'3\'
}
html = requests.get(\'https://sou.zhaopin.com/\',headers=headers,params=data)  # 添加参数
print(html.text)  # 调取text属性查看页面代码

高级用法

cookies设置，代理设置等

Cookies

获取cookies:

import requests  # 导入requests
headers = {
    \'User-Agent\': \'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36\'
}
data = {
    \'jl\': \'765\',
    \'kw\': \'python\',
    \'kt\': \'3\'
}
html = requests.get(\'https://blog.csdn.net/qq_40966461/article/details/104974998\',headers=headers,params=data)  # 添加参数
print(html.cookies)  # 调取text属性查看页面代码
for key,value in html.cookies.items():
    print(key+\'=\'+value)

很简单，直接获取cookies属性即可

维持会话Session()

在requests中，如果直接利用get()或post()等方法可以做到模拟网页的请求，但是这实际上时相当于不同的会话，相当于用了两个浏览器打开了不同的页面，这时需要用session对象来维护对话

import requests  # 导入requests
headers = {
    \'User-Agent\': \'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36\'
}
data = {
    \'jl\': \'765\',
    \'kw\': \'python\',
    \'kt\': \'3\'
}
html = requests.Session().get(\'https://blog.csdn.net/qq_40966461/article/details/104974998\',headers=headers,params=data)  # 添加参数
print(html.cookies)  # 调取text属性查看页面代码
for key,value in html.cookies.items():
    print(key+\'=\'+value)

调用requests模块中get方法时先创建一个Session对象

SSL证书验证

import requests  # 导入requests
headers = {
    \'User-Agent\': \'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36\'
}
response  = requests.get(\'http://www.12306.cn\',headers=headers,verify = False)
print(response.status_code)

verify=False即可

代理设置

import requests  # 导入requests
headers = {
    \'User-Agent\': \'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.149 Safari/537.36\'
}
proxies = {
    \"http\":\"http://183.166.132.176\",
    \"https\":\"https://183.166.132.176\"
}
response  = requests.get(\'http://www.12306.cn\',headers=headers,proxies=proxies,verify = False)
print(response.status_code)

添加proxies即可，代理可以搜索快代理

超时设置

加参数timeout= 1

身份认证

get中添加参数 auth=(‘username’,‘password’)

OAuth认证方式

版权声明 1 本网站名称：偶像资源网
2 本站永久网址：https://www.ox520.com
3 本网站的文章部分内容可能来源于网络，仅供大家学习与参考，如有侵权，请联系站长 QQ593098775进行删除处理。
4 本站一切资源不代表本站立场，并不代表本站赞同其观点和对其真实性负责。
5 本站一律禁止以任何方式发布或转载任何违法的相关信息，访客发现请向站长举报
6 本站资源大多存储在云盘，如发现链接失效，请联系我们我们会第一时间更新。

THE END