写在前面
之前分享过CDP的知识,现在有一个需求需要在selenium自动化时捕获网络请求,这里记录一下。
书接上回:浏览器自动化必须知道CDP协议
实现
开启日志记录
首先,我们要开启浏览器的日志记录,首先需要配置一个capabilities,它允许定义浏览器的一些特性。
import json
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
chrome_option = Options()
chrome_option.set_capability("goog:loggingPrefs",{"performance":"ALL"})
service = Service("./chromedriver-win64/chromedriver.exe")
driver = webdriver.Chrome(service=service, options=chrome_option)
在最新的selenium的版本中,是通过Options()去配置浏览器的一些属性,在这里通过配置
'goog:loggingPrefs': {'performance': 'ALL'}
来打开浏览器的性能日志记录。
获取日志
接下来我们尝试获取一下访问百度的日志
driver.get("https://www.baidu.com")
performance_log = driver.get_log("performance")
在访问百度后,通过get_log(“performance”)来获取性能日志,这是一个由字典组成的列表。这里我打印一个看一下格式
{'level':'INFO','message':'{"message":{"method":"Network.requestWillBeSent","params":{"documentURL":"https://www.baidu.com/","frameId":"62715239374117F099DBA348C45736CD","hasUserGesture":false,"initiator":{"type":"other"},"loaderId":"C5F286A8A5744DEEB277D0718C4E34E8","redirectHasExtraInfo":false,"request":{"headers":{"Upgrade-Insecure-Requests":"1","User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36","sec-ch-ua":"\"Not/A)Brand\";v=\"99\", \"Google Chrome\";v=\"115\", \"Chromium\";v=\"115\"","sec-ch-ua-mobile":"?0","sec-ch-ua-platform":"\"Windows\""},"initialPriority":"VeryHigh","isSameSite":true,"method":"GET","mixedContentType":"none","referrerPolicy":"strict-origin-when-cross-origin","url":"https://www.baidu.com/"},"requestId":"C5F286A8A5744DEEB277D0718C4E34E8","timestamp":1508.943033,"type":"Document","wallTime":1692620664.219256}},"webview":"62715239374117F099DBA348C45736CD"}','timestamp':1692620664216}
可以看到,关键的信息都在message中,注意message中是一个json字段
{"message":{"method":"Network.requestWillBeSent","params":{"documentURL":"https://www.baidu.com/","frameId":"62715239374117F099DBA348C45736CD","hasUserGesture":false,"initiator":{"type":"other"},"loaderId":"C5F286A8A5744DEEB277D0718C4E34E8","redirectHasExtraInfo":false,"request":{"headers":{"Upgrade-Insecure-Requests":"1","User-Agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/115.0.0.0 Safari/537.36","sec-ch-ua":"\"Not/A)Brand\";v=\"99\", \"Google Chrome\";v=\"115\", \"Chromium\";v=\"115\"","sec-ch-ua-mobile":"?0","sec-ch-ua-platform":"\"Windows\""},"initialPriority":"VeryHigh","isSameSite":true,"method":"GET","mixedContentType":"none","referrerPolicy":"strict-origin-when-cross-origin","url":"https://www.baidu.com/"},"requestId":"C5F286A8A5744DEEB277D0718C4E34E8","timestamp":1508.943033,"type":"Document","wallTime":1692620664.219256}},"webview":"62715239374117F099DBA348C45736CD"}
这是一个请求包的示例,大家看到method凡是以Network开头的都是网络请求。
获取网络返回包
我这个需求是想获取一个请求的返回包的数据,其实本身在performance日志中也有Network.response*相关的日志,但不是完整的请求包
所以,就需要requesetId这个字段了,通过CDP来获取
message = json.loads(packet.get("message")).get("message")
packet_method = message.get("method")if"Network"in packet_method:
request_id = message.get("params").get("requestId")
resp = driver.execute_cdp_cmd('Network.getResponseBody',{'requestId': request_id})
body = resp.get("body")
这里就可以将返回包完整的获取到了
如果有定制化需求,比如想获取某个链接的返回包等等,都可以去通过分析数据通过条件判断来处理
完整代码
最后附上完整测试代码
import json
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
chrome_option = Options()
chrome_option.set_capability("goog:loggingPrefs",{"performance":"ALL"})
service = Service("./chromedriver-win64/chromedriver.exe")
driver = webdriver.Chrome(service=service, options=chrome_option)
driver.get("https://www.baidu.com")
performance_log = driver.get_log("performance")for packet in performance_log:
message = json.loads(packet.get("message")).get("message")
packet_method = message.get("method")if"Network"in packet_method:
request_id = message.get("params").get("requestId")try:
resp = driver.execute_cdp_cmd('Network.getResponseBody',{'requestId': request_id})
body = resp.get("body")print(body)except:pass
版权归原作者 银空飞羽 所有, 如有侵权,请联系我们删除。