python+Selenium多线程后台爬虫例子

Selenium多线程后台爬虫
一、前言：
有些网站不支持网页源码爬虫、或要爬取的网页内容不在网页源码中，
等需要使用Selenium进行爬虫
二、准备工作：
安装selenium及对应googlechrome浏览器
安装方法：参考安装教程
三、多线程原理：
1、利用同一个浏览器打开多页面、相当于打开一个线程、提高爬虫速度
2、同时打开多个浏览器，相关于打开多个线程。多线程提高爬虫速度
部分代码如下：

关键代码：

import time
import re
import threading
import queue
from browsermobproxy import Server
from selenium import webdriver
from selenium.webdriver.chrome.options import Options

defjiexi1():#后台运行chrom浏览器
    option = webdriver.ChromeOptions()# 设置option 后台运行
    option.add_argument('--headless')# 设置option 后台运行
    option.add_argument('--blink-settings=imagesEnabled=false')# 设置option 不显示图片提高速度  # 设置option 后台运行
    option.add_argument('--disable-gpu')# 禁用GPU加速  # 设置option 后台运行
    web = webdriver.Chrome(chrome_options=option)# 调用带参数的谷歌浏览器  # 设置option 后台运行
    web.get('http://www.baidu.com')#初始页面
    
    地址 = url.get()#取地址

    js1="window.open('%s')"% 地址     #打开 新地址
    web.execute_script(js1)
    web.switch_to.window(web.window_handles[-1])
    jb1=web.window_handles[-1]for i inrange(5):#多线程打开浏览器
        t1 = threading.Thread(target=jiexi1)  
        t1.start()#time.sleep(1)for ii inrange(5):
        t1.join()

标签： python 爬虫 selenium

本文转载自: https://blog.csdn.net/wg2627/article/details/127380184
版权归原作者 wg2627 所有，如有侵权，请联系我们删除。

python+Selenium多线程后台爬虫例子

发表评论

“python+Selenium多线程后台爬虫例子”的评论:

关于作者

overfit同步小助手

相关阅读

文章导航