0


使用 Java 和 Selenium 实现验证码识别登录详解

一、准备工作
确保安装了 Java Development Kit (JDK)。
下载并配置 Selenium WebDriver。
下载并配置 Tesseract-OCR,用于验证码识别。
二、打开网站并设置浏览器窗口
首先,打开浏览器并将窗口最大化,以确保每次截取的图片都是相同的大小:

java

import org.openqa.selenium.WebDriver;
import org.openqa.selenium.chrome.ChromeDriver;

public class Main {
public static void main(String[] args) {
System.setProperty("webdriver.chrome.driver", "path/to/chromedriver");

    WebDriver driver = new ChromeDriver();
     driver.get("https://www.example.com");
     driver.manage().window().maximize();
 }

}
三、截取带有验证码的网页内容
截取当前屏幕内容,并保存到本地:

java

import org.openqa.selenium.OutputType;
import org.openqa.selenium.TakesScreenshot;

import java.io.File;
import java.io.IOException;
import org.apache.commons.io.FileUtils;

public class Main {
public static void main(String[] args) throws IOException {
System.setProperty("webdriver.chrome.driver", "path/to/chromedriver");

    WebDriver driver = new ChromeDriver();
     driver.get("https://www.example.com");
     driver.manage().window().maximize();

    File screenshot = ((TakesScreenshot)driver).getScreenshotAs(OutputType.FILE);
     FileUtils.copyFile(screenshot, new File("H://test/01.png"));
 }

}
四、识别图片验证码
使用 Tesseract 识别图片验证码
定位验证码在图片中的位置并截取:
java

import net.sourceforge.tess4j.ITesseract;
import net.sourceforge.tess4j.Tesseract;
import net.sourceforge.tess4j.TesseractException;

import java.awt.image.BufferedImage;
import java.io.File;
import javax.imageio.ImageIO;

public class Main {
public static void main(String[] args) throws IOException, TesseractException {
System.setProperty("webdriver.chrome.driver", "path/to/chromedriver");

    WebDriver driver = new ChromeDriver();
     driver.get("https://www.example.com");
     driver.manage().window().maximize();

    File screenshot = ((TakesScreenshot)driver).getScreenshotAs(OutputType.FILE);
     FileUtils.copyFile(screenshot, new File("H://test/01.png"));

    BufferedImage fullImg = ImageIO.read(new File("H://test/01.png"));
     BufferedImage captchaImg = fullImg.getSubimage(564, 395, 79, 28); // 验证码位置
     ImageIO.write(captchaImg, "png", new File("H://test/02.png"));

    ITesseract instance = new Tesseract();
     instance.setDatapath("path/to/tessdata"); // 设置tessdata的路径
     String captchaText = instance.doOCR(captchaImg).replaceAll("[^a-zA-Z0-9]", "");
     System.out.println("Captcha: " + captchaText);
 }

}
五、输入账号、密码和验证码
定位账号、密码和验证码输入框,并输入相关内容:

java

import org.openqa.selenium.By;
import org.openqa.selenium.WebElement;

public class Main {
public static void main(String[] args) throws IOException, TesseractException {
System.setProperty("webdriver.chrome.driver", "path/to/chromedriver");

    WebDriver driver = new ChromeDriver();更多内容联系1436423940
     driver.get("https://www.example.com");
     driver.manage().window().maximize();

    File screenshot = ((TakesScreenshot)driver).getScreenshotAs(OutputType.FILE);
     FileUtils.copyFile(screenshot, new File("H://test/01.png"));

    BufferedImage fullImg = ImageIO.read(new File("H://test/01.png"));
     BufferedImage captchaImg = fullImg.getSubimage(564, 395, 79, 28); // 验证码位置
     ImageIO.write(captchaImg, "png", new File("H://test/02.png"));

    ITesseract instance = new Tesseract();
     instance.setDatapath("path/to/tessdata"); // 设置tessdata的路径
     String captchaText = instance.doOCR(captchaImg).replaceAll("[^a-zA-Z0-9]", "");
     System.out.println("Captcha: " + captchaText);

    WebElement username = driver.findElement(By.id("username"));
     WebElement password = driver.findElement(By.id("password_1"));
     WebElement captcha = driver.findElement(By.id("user_ck"));

    username.sendKeys("your_username");
     password.sendKeys("your_password");
     captcha.sendKeys(captchaText);
 }

}
六、点击登录按钮
定位并点击登录按钮:

java

import org.openqa.selenium.By;
import org.openqa.selenium.WebElement;

public class Main {
public static void main(String[] args) throws IOException, TesseractException {
System.setProperty("webdriver.chrome.driver", "path/to/chromedriver");

    WebDriver driver = new ChromeDriver();
     driver.get("https://www.example.com");
     driver.manage().window().maximize();

    File screenshot = ((TakesScreenshot)driver).getScreenshotAs(OutputType.FILE);
     FileUtils.copyFile(screenshot, new File("H://test/01.png"));

    BufferedImage fullImg = ImageIO.read(new File("H://test/01.png"));
     BufferedImage captchaImg = fullImg.getSubimage(564, 395, 79, 28); // 验证码位置
     ImageIO.write(captchaImg, "png", new File("H://test/02.png"));

    ITesseract instance = new Tesseract();
     instance.setDatapath("path/to/tessdata"); // 设置tessdata的路径
     String captchaText = instance.doOCR(captchaImg).replaceAll("[^a-zA-Z0-9]", "");
     System.out.println("Captcha: " + captchaText);

    WebElement username = driver.findElement(By.id("username"));
     WebElement password = driver.findElement(By.id("password_1"));
     WebElement captcha = driver.findElement(By.id("user_ck"));

    username.sendKeys("your_username");
     password.sendKeys("your_password");
     captcha.sendKeys(captchaText);

    WebElement loginButton = driver.findElement(By.name("yt0"));
     loginButton.click();
 }

}
七、关闭浏览器
最后,关闭浏览器:

java

public class Main {
public static void main(String[] args) throws IOException, TesseractException {
System.setProperty("webdriver.chrome.driver", "path/to/chromedriver");

    WebDriver driver = new ChromeDriver();
     driver.get("https://www.example.com");
     driver.manage().window().maximize();

    File screenshot = ((TakesScreenshot)driver).getScreenshotAs(OutputType.FILE);
     FileUtils.copyFile(screenshot, new File("H://test/01.png"));

    BufferedImage fullImg = ImageIO.read(new File("H://test/01.png"));
     BufferedImage captchaImg = fullImg.getSubimage(564, 395, 79, 28); // 验证码位置
     ImageIO.write(captchaImg, "png", new File("H://test/02.png"));

    ITesseract instance = new Tesseract();
     instance.setDatapath("path/to/tessdata"); // 设置tessdata的路径
     String captchaText = instance.doOCR(captchaImg).replaceAll("[^a-zA-Z0-9]", "");
     System.out.println("Captcha: " + captchaText);

    WebElement username = driver.findElement(By.id("username"));
     WebElement password = driver.findElement(By.id("password_1"));
     WebElement captcha = driver.findElement(By.id("user_ck"));

    username.sendKeys("your_username");
     password.sendKeys("your_password");
     captcha.sendKeys(captchaText);

    WebElement loginButton = driver.findElement(By.name("yt0"));
     loginButton.click();

    driver.quit();
 }

}
八、问题和解决方案
Tesseract-OCR 报错解决方案
在使用 Tesseract 识别图片时,如果报错 tesseract-ocr 相关信息,可以通过 tesseract-ocr 下载 页面下载并安装 tesseract-ocr。

设置 Tesseract 实例的 datapath,确保其指向 tesseract-ocr 的安装路径:

java

ITesseract instance = new Tesseract();
instance.setDatapath("path/to/tessdata"); // 设置tessdata的路径

标签: python 开发语言

本文转载自: https://blog.csdn.net/asfdsgdf/article/details/140112479
版权归原作者 asfdsgdf 所有, 如有侵权,请联系我们删除。

“使用 Java 和 Selenium 实现验证码识别登录详解”的评论:

还没有评论