0


保姆级爬虫无水印视频大全 最新版java+selenium

目录

适用抖音、快手视频和标题获取

1、前言

本篇介绍从电脑如何安装多版本Chrome到Java结合selenium爬虫实现网页、API数据获取技术,抖音和快手也会不定期会更新请求方式,注意版本适配。适用win10,win11,有需要的小伙伴可以继续往下看。

2、环境配置

2.1、浏览器环境

浏览器安装参考链接:点击链接
首先电脑任意盘创建文件夹(根据自己喜好命名),暂且命名为old_chrome,在网上下载主启动GoogleChromePortable.exe文件放置到old_chrome文件下
GoogleChromePortable.exe下载地址:点击链接
在这里插入图片描述
通过360压缩或其他压缩软件右击打开,不是解压,是右击选择360压缩软件打开,把GoogleChromePortable.exe拖出来(如上图所示)
在这里插入图片描述
在old_chrome下创建一个新的文件夹,为了方便查看,我使用的114版本,文件命名为old_chrome114,在网上下载对应Chrome版本的离线安装包(一般文件大小>50MB为离线安装包),后缀为.exe的文件,放到old_chrome114文件下。
如果找不到离线安装包,可参考Chrome 的107以前版本下载地址:点击链接
然后查看安装包按照下图所示步骤,右击点击属性,点击数字前面,双击下面签名者名称,查看数字签名信息是否正常,此处必须数字签名正常的情况才可以进行后续操作。
在这里插入图片描述
然后用同样的方法,右击选择360压缩软件打开看到chrome.7z。
新建文件夹APP,把chrome.7z文件拖拽到APP文件下进行解压,得到Chrome-bin文件。chrome.7z压缩包就可以删除了
在这里插入图片描述
把开始下载好的GoogleChromePortable.exe文件复制到old_chrome114文件下,我这里名字改加了版本号改成了GoogleChromePortable114.exe。
在这里插入图片描述
然后双击GoogleChromePortable114.exe就可以启动114版本的浏览器了,启动之后会在当前文件夹里面创建一个Data文件夹存放数据。之后可以在浏览器里面查看关于Chrome,查看版本,下图我两个不同谷歌浏览器的运行展示。也可以右击GoogleChromePortable114.exe点击发送到——桌面快捷方式,即可在桌面创建快捷访问,如果想安装多个版本的浏览器可以按照这种操作逐个添加。

在这里插入图片描述在这里插入图片描述

2.2、浏览器驱动

百度下载对应的Chrome驱动chromedriver,放到对应版本的文件下,然后进行解压,拿到chromedriver.exe,记住这个路径。
在这里插入图片描述

2.3、开发环境

开发使用的jdk1.8,搭建的spring项目
爬虫使用依赖4.10版本

<!--爬虫--><dependency><groupId>org.seleniumhq.selenium</groupId><artifactId>selenium-java</artifactId><version>4.10.0</version></dependency><dependency><groupId>org.seleniumhq.selenium</groupId><artifactId>selenium-chromium-driver</artifactId><version>4.10.0</version></dependency><dependency><groupId>org.seleniumhq.selenium</groupId><artifactId>selenium-devtools-v114</artifactId><version>4.10.0</version></dependency>

3、抖音爬虫获取

因为抖音改版后不能直接通过API获取视频链接和标题,视频链接可以通过API获取,标题需要通过请求前端html,通过截取获得。

3.1、视频获取

packagecn.executor;importcn.hutool.http.HttpUtil;importcn.perfectlinks.node.properties.RemoveWatermarkProperties;importcn.perfectlinks.node.utils.RemoveWatermarkConstant;importcn.perfectlinks.node.vo.VideoRemoveWatermarkVo;importcn.perfectlinks.node.vo.VideoUrlVo;importcom.alibaba.fastjson2.JSONObject;importlombok.RequiredArgsConstructor;importlombok.SneakyThrows;importlombok.extern.slf4j.Slf4j;importorg.apache.commons.lang3.StringUtils;importorg.openqa.selenium.chrome.ChromeDriver;importorg.openqa.selenium.chrome.ChromeOptions;importorg.openqa.selenium.devtools.DevTools;importorg.openqa.selenium.devtools.v114.network.Network;importorg.openqa.selenium.devtools.v114.network.model.Request;importorg.springframework.stereotype.Component;importjava.io.IOException;importjava.util.Objects;importjava.util.Optional;importjava.util.concurrent.TimeUnit;importjava.util.regex.Matcher;importjava.util.regex.Pattern;@Component@Slf4j@RequiredArgsConstructorpublicclassDYVideo{privatefinalRemoveWatermarkProperties removeWatermarkProperties;@SneakyThrowspublicVideoRemoveWatermarkVoexecutor(String oldVideoUrl)throwsIOException{
        log.info("请求参数:"+ oldVideoUrl);VideoRemoveWatermarkVo videoRemoveWatermarkVo =newVideoRemoveWatermarkVo();// 拿到分享链接中的视频地址String filterUrl =this.filterUrl(oldVideoUrl);// 视频解析if(oldVideoUrl.contains(RemoveWatermarkConstant.D_Y_COM)){// dou_yin由于有真人验证问题需要多掉几次接口Integer n =RemoveWatermarkConstant.ZERO;do{
                n++;//                videoRemoveWatermarkVo = this.douYinParseUrl(filterUrl);VideoUrlVo videoUrlVo =this.getTrueAddress(filterUrl,RemoveWatermarkConstant.D_Y_TYPE);//去水印
                videoRemoveWatermarkVo.setUrl(videoUrlVo.getResponseVideoUrl().replaceAll(RemoveWatermarkConstant.PLAY_WM,RemoveWatermarkConstant.PLAY));}while(StringUtils.isBlank(videoRemoveWatermarkVo.getUrl())&& n <=RemoveWatermarkConstant.FIVE);}else{thrownewException(RemoveWatermarkConstant.ONLY_SUPPORT_ERR);}if(videoRemoveWatermarkVo.getUrl()==null){thrownewException(RemoveWatermarkConstant.SHARING_FAILURE);}return videoRemoveWatermarkVo;}/**
     * 方法描述: 抖音视频去水印
     */privateVideoRemoveWatermarkVodouYinParseUrl(String url){VideoRemoveWatermarkVo videoRemoveWatermarkVo =newVideoRemoveWatermarkVo();try{VideoUrlVo trueAddress =this.getTrueAddress(url,RemoveWatermarkConstant.D_Y_TYPE);
            log.info(RemoveWatermarkConstant.D_Y_DATA, trueAddress);if(StringUtils.isBlank(trueAddress.getResponseVideoUrl()))return videoRemoveWatermarkVo;// 调抖音接口获取视频数据String jsonStr =HttpUtil.get(trueAddress.getResponseVideoUrl());
            log.info(RemoveWatermarkConstant.D_Y_API_DATA, jsonStr);if(StringUtils.isBlank(jsonStr))return videoRemoveWatermarkVo;JSONObject obj =JSONObject.parseObject(jsonStr);// 获取当前的视频的真实urlString videoAddress = obj.getJSONArray(RemoveWatermarkConstant.ITEM_LIST).getJSONObject(RemoveWatermarkConstant.ZERO).getJSONObject(RemoveWatermarkConstant.VIDEO).getJSONObject(RemoveWatermarkConstant.PLAY_ADDR).getJSONArray(RemoveWatermarkConstant.URL_LIST).get(RemoveWatermarkConstant.ZERO).toString();// 把playwm替换成play
            videoAddress = videoAddress.replaceAll(RemoveWatermarkConstant.PLAY_WM,RemoveWatermarkConstant.PLAY);// 视频标题String title = obj.getJSONArray(RemoveWatermarkConstant.ITEM_LIST).getJSONObject(RemoveWatermarkConstant.ZERO).getString(RemoveWatermarkConstant.DESC);
            videoRemoveWatermarkVo.setUrl(videoAddress).setTitle(title);}catch(Exception e){
            log.error(RemoveWatermarkConstant.D_Y_API_ERR, e.getMessage());}
        log.info("videoRemoveWatermarkVo参数:{}",videoRemoveWatermarkVo.toString());return videoRemoveWatermarkVo;}/**
     * 方法描述: 过滤分享链接的中文汉字
     */privateStringfilterUrl(String url){Matcher m =Pattern.compile(RemoveWatermarkConstant.REGEX).matcher(url);if(m.find()){return url.substring(m.start(), m.end());}return"";}/**
     * 爬取原视频需要的地址和入参
     */privateVideoUrlVogetTrueAddress(String url,Integer type){VideoUrlVo videoUrlVo =newVideoUrlVo();//本地测试驱动路径//        System.setProperty(RemoveWatermarkConstant.DRIVER_URL, "D:\\Program Files\\old_chrome\\old_chrome114\\chromedriver114win32\\chromedriver.exe");System.setProperty(RemoveWatermarkConstant.DRIVER_URL, removeWatermarkProperties.getDriver_url());// 设置谷歌浏览器入参ChromeOptions options =newChromeOptions();//本地浏览器路径//        options.setBinary("D:\\Program Files\\old_chrome\\old_chrome114\\APP\\Chrome-bin\\chrome.exe");
        options.addArguments(RemoveWatermarkConstant.CHROME_USER_AGENT+RemoveWatermarkConstant.CHROME_USER_AGENT_ANDROID);
        options.addArguments(RemoveWatermarkConstant.DISABLE_BLINK_FEATURES);
        options.addArguments(RemoveWatermarkConstant.DISABLE_EXTENSIONS);
        options.addArguments(RemoveWatermarkConstant.DISABLE_POPUP_BLOCKING);// 设置浏览器选项,模拟移动设备
        options.addArguments(RemoveWatermarkConstant.WINDOW_SIZE);// 创建 ChromeDriver 并传入 ChromeOptionsChromeDriver driver =newChromeDriver(options);// 启用 Chrome DevToolsDevTools devTools = driver.getDevTools();
        devTools.createSession();// 抖音是GET请求直接拿URL就行//爬虫更新后的抖音视频获取
        devTools.addListener(Network.requestWillBeSent(), response ->{Request request = response.getRequest();if(Objects.nonNull(request)&& request.getUrl().contains(RemoveWatermarkConstant.D_Y_VIDEO_URL)){
                videoUrlVo.setResponseVideoUrl(request.getUrl());}if(Objects.nonNull(request)&& request.getUrl().contains(RemoveWatermarkConstant.D_Y_RE_VIDEO_URL)){
                videoUrlVo.setRedirectUrl(request.getUrl());}});// 启用监听器
        devTools.send(Network.enable(Optional.empty(),Optional.empty(),Optional.empty()));// 访问目标网页
        driver.get(url);try{TimeUnit.SECONDS.sleep(2);}catch(InterruptedException e){thrownewRuntimeException(e);}// 关闭浏览器
        driver.quit();return videoUrlVo;}}

3.2、标题获取

packagecn.executor;importcn.perfectlinks.node.properties.RemoveWatermarkProperties;importcn.perfectlinks.node.utils.RemoveWatermarkConstant;importcom.perfectlinks.applink.common.core.exception.Assert;importlombok.RequiredArgsConstructor;importlombok.extern.slf4j.Slf4j;importorg.apache.http.HttpEntity;importorg.apache.http.HttpStatus;importorg.apache.http.client.ClientProtocolException;importorg.apache.http.client.methods.CloseableHttpResponse;importorg.apache.http.client.methods.HttpGet;importorg.apache.http.client.utils.HttpClientUtils;importorg.apache.http.impl.client.CloseableHttpClient;importorg.apache.http.impl.client.HttpClients;importorg.apache.http.util.EntityUtils;importorg.springframework.stereotype.Component;importjava.io.IOException;@Component@Slf4j@RequiredArgsConstructorpublicclassTitle{publicStringtitleFetch(String redirectUrl){CloseableHttpClient httpClient =HttpClients.createDefault();CloseableHttpResponse response =null;//2.创建get请求HttpGet request =newHttpGet(redirectUrl);//设置请求头,将爬虫伪装成浏览器
        request.setHeader("User-Agent",RemoveWatermarkConstant.CHROME_USER_AGENT_ANDROID);try{//3.执行get请求
            response = httpClient.execute(request);//4.判断响应状态为200,进行处理Assert.isTrue(response.getStatusLine().getStatusCode()==HttpStatus.SC_OK,"视频标题获取失败");//5.获取响应内容HttpEntity httpEntity = response.getEntity();Assert.isTrue(httpEntity !=null,"视频标题获取失败");String html =EntityUtils.toString(httpEntity,"utf-8");String extractedContent =extractContent(html);Assert.isTrue(!"".equals(extractedContent),"视频标题获取失败");return extractedContent.split("\"")[3].split(" - ")[0];}catch(ClientProtocolException e){
            e.printStackTrace();}catch(IOException e){
            e.printStackTrace();}finally{//6.关闭HttpClientUtils.closeQuietly(response);HttpClientUtils.closeQuietly(httpClient);}returnnull;}//截取字符串publicstaticStringextractContent(String htmlString){String startTag ="name=\"description\" content=\"";String endTag ="\"/><meta data-react-helmet=\"true\" name=\"keywords\"";int startIndex = htmlString.indexOf(startTag);int endIndex = htmlString.indexOf(endTag);if(startIndex ==-1|| endIndex ==-1){return"";}return htmlString.substring(startIndex, endIndex);}}

4、快手爬虫获取

packagecn.perfectlinks.node.executor;importcn.hutool.http.HttpRequest;importcn.hutool.http.HttpResponse;importcn.hutool.json.JSONUtil;importcn.perfectlinks.node.properties.RemoveWatermarkProperties;importcn.perfectlinks.node.utils.RemoveWatermarkConstant;importcn.perfectlinks.node.vo.VideoRemoveWatermarkVo;importcn.perfectlinks.node.vo.VideoUrlVo;importcom.alibaba.fastjson2.JSONObject;importlombok.RequiredArgsConstructor;importlombok.SneakyThrows;importlombok.extern.slf4j.Slf4j;importorg.apache.commons.lang3.StringUtils;importorg.openqa.selenium.chrome.ChromeDriver;importorg.openqa.selenium.chrome.ChromeOptions;importorg.openqa.selenium.devtools.DevTools;importorg.openqa.selenium.devtools.v114.network.Network;importorg.springframework.stereotype.Component;importjava.io.IOException;importjava.net.HttpURLConnection;importjava.net.URL;importjava.util.Objects;importjava.util.Optional;importjava.util.concurrent.TimeUnit;importjava.util.regex.Matcher;importjava.util.regex.Pattern;@Component@Slf4j@RequiredArgsConstructorpublicclassKSVideo{privatefinalRemoveWatermarkProperties removeWatermarkProperties;@SneakyThrowspublicVideoRemoveWatermarkVoexecutor(String oldVideoUrl)throwsIOException{
        log.info("请求参数:"+ oldVideoUrl);VideoRemoveWatermarkVo videoRemoveWatermarkVo =newVideoRemoveWatermarkVo();// 拿到分享链接中的视频地址String filterUrl =this.filterUrl(oldVideoUrl);// 视频解析if(oldVideoUrl.contains(RemoveWatermarkConstant.K_S_COM)){
            videoRemoveWatermarkVo =this.ksParseUrl(filterUrl);}else{thrownewException(RemoveWatermarkConstant.ONLY_SUPPORT_ERR);}if(videoRemoveWatermarkVo.getUrl()==null){thrownewException(RemoveWatermarkConstant.SHARING_FAILURE);}return videoRemoveWatermarkVo;}/**
     * 方法描述: 快手视频去水印
     */privateVideoRemoveWatermarkVoksParseUrl(String url){VideoRemoveWatermarkVo videoRemoveWatermarkVo =newVideoRemoveWatermarkVo();// 爬取请求数据VideoUrlVo trueAddress =this.getTrueAddress(url,RemoveWatermarkConstant.K_S_TYPE);
        log.info(RemoveWatermarkConstant.K_S_DATA, trueAddress.getResponseVideoUrl(), trueAddress.getReferer());if(StringUtils.isBlank(trueAddress.getResponseVideoUrl())||StringUtils.isBlank(trueAddress.getReferer()))return videoRemoveWatermarkVo;// 获取快手cookiethis.getCookieInfo(trueAddress);
        log.info(RemoveWatermarkConstant.K_S_COOKIE, trueAddress.getCookieInfo());if(StringUtils.isBlank(trueAddress.getCookieInfo()))return videoRemoveWatermarkVo;try{if(StringUtils.isBlank(trueAddress.getVideoPostBody()))return videoRemoveWatermarkVo;String videoPostBody = trueAddress.getVideoPostBody();JSONObject obj =JSONObject.parseObject(videoPostBody);// post请求设置请求体cn.hutool.json.JSONObject map =JSONUtil.createObj();this.setPostParams(obj, map);if(StringUtils.isBlank(trueAddress.getResponseVideoUrl()))return videoRemoveWatermarkVo;HttpResponse execute =HttpRequest.post(trueAddress.getResponseVideoUrl()).header(RemoveWatermarkConstant.USER_AGENT,RemoveWatermarkConstant.CHROME_USER_AGENT_IPHONE).header(RemoveWatermarkConstant.COOKIE, trueAddress.getCookieInfo()).header(RemoveWatermarkConstant.REFERER, trueAddress.getReferer()).body(map.toString()).execute();String body = execute.body();if(StringUtils.isBlank(body))return videoRemoveWatermarkVo;JSONObject jsonObject =JSONObject.parseObject(body);// 获取标题String title = jsonObject.getJSONObject(RemoveWatermarkConstant.SHARE_INFO).getString(RemoveWatermarkConstant.SHARE_TITLE);// 获取无水印视频链接String videoAddress = jsonObject.getString(RemoveWatermarkConstant.MP4_URL);
            videoRemoveWatermarkVo.setTitle(title).setUrl(videoAddress);}catch(Exception e){
            log.error(RemoveWatermarkConstant.K_S_API_ERR, e.getMessage());}return videoRemoveWatermarkVo;}privatevoidsetPostParams(JSONObject obj,cn.hutool.json.JSONObject map){
        map.set(RemoveWatermarkConstant.FID, obj.getString(RemoveWatermarkConstant.FID));
        map.set(RemoveWatermarkConstant.SHARE_TOKEN, obj.getString(RemoveWatermarkConstant.SHARE_TOKEN));
        map.set(RemoveWatermarkConstant.SHARE_OBJECT_ID, obj.getString(RemoveWatermarkConstant.SHARE_OBJECT_ID));
        map.set(RemoveWatermarkConstant.SHARE_METHOD, obj.getString(RemoveWatermarkConstant.SHARE_METHOD));
        map.set(RemoveWatermarkConstant.SHARE_ID, obj.getString(RemoveWatermarkConstant.SHARE_ID));
        map.set(RemoveWatermarkConstant.SHARE_RESOURCE_TYPE, obj.getString(RemoveWatermarkConstant.SHARE_RESOURCE_TYPE));
        map.set(RemoveWatermarkConstant.SHARE_CHANNEL, obj.getString(RemoveWatermarkConstant.SHARE_CHANNEL));
        map.set(RemoveWatermarkConstant.KPN, obj.getString(RemoveWatermarkConstant.KPN));
        map.set(RemoveWatermarkConstant.SUB_BIZ, obj.getString(RemoveWatermarkConstant.SUB_BIZ));
        map.set(RemoveWatermarkConstant.ENV, obj.getString(RemoveWatermarkConstant.ENV));
        map.set(RemoveWatermarkConstant.H5_DOMAIN, obj.getString(RemoveWatermarkConstant.H5_DOMAIN));
        map.set(RemoveWatermarkConstant.PHOTO_ID, obj.getString(RemoveWatermarkConstant.PHOTO_ID));
        map.set(RemoveWatermarkConstant.IS_LONG_VIDEO, obj.getString(RemoveWatermarkConstant.IS_LONG_VIDEO));}/**
     * 方法描述: 过滤分享链接的中文汉字
     */privateStringfilterUrl(String url){Matcher m =Pattern.compile(RemoveWatermarkConstant.REGEX).matcher(url);if(m.find()){return url.substring(m.start(), m.end());}return"";}/**
     * 爬取原视频需要的地址和入参
     */privateVideoUrlVogetTrueAddress(String url,Integer type){VideoUrlVo videoUrlVo =newVideoUrlVo();//        System.setProperty(RemoveWatermarkConstant.DRIVER_URL, "D:\\Program Files\\old_chrome\\chrome114\\chromedriver114win32\\chromedriver.exe");//本地测试驱动路径System.setProperty(RemoveWatermarkConstant.DRIVER_URL, removeWatermarkProperties.getDriver_url());// 设置谷歌浏览器入参ChromeOptions options =newChromeOptions();//        options.setBinary("D:\\Program Files\\old_chrome\\chrome114\\APP\\Chrome-bin\\chrome.exe");
        options.addArguments(RemoveWatermarkConstant.CHROME_USER_AGENT+RemoveWatermarkConstant.CHROME_USER_AGENT_ANDROID);
        options.addArguments(RemoveWatermarkConstant.DISABLE_BLINK_FEATURES);
        options.addArguments(RemoveWatermarkConstant.DISABLE_EXTENSIONS);
        options.addArguments(RemoveWatermarkConstant.DISABLE_POPUP_BLOCKING);// 设置浏览器选项,模拟移动设备
        options.addArguments(RemoveWatermarkConstant.WINDOW_SIZE);// 创建 ChromeDriver 并传入 ChromeOptionsChromeDriver driver =newChromeDriver(options);// 启用 Chrome DevToolsDevTools devTools = driver.getDevTools();
        devTools.createSession();// 快手是POST请求需要拿请求体
        devTools.addListener(Network.requestWillBeSent(), request ->{if(Objects.nonNull(request.getRequest())&&RemoveWatermarkConstant.POST.equals(request.getRequest().getMethod())){if(request.getRequest().getUrl().contains(RemoveWatermarkConstant.K_S_URL)){
                    videoUrlVo.setResponseVideoUrl(
                            request.getRequest().getUrl());
                    request.getRequest().getPostData().ifPresent(videoUrlVo::setVideoPostBody);String referer =Objects.requireNonNull(
                            request.getRequest().getHeaders().get(RemoveWatermarkConstant.REFERER)).toString();
                    videoUrlVo.setReferer(referer);}}});// 启用监听器
        devTools.send(Network.enable(Optional.empty(),Optional.empty(),Optional.empty()));// 访问目标网页
        driver.get(url);try{TimeUnit.SECONDS.sleep(2);}catch(InterruptedException e){thrownewRuntimeException(e);}// 关闭浏览器
        driver.quit();return videoUrlVo;}privatevoidgetCookieInfo(VideoUrlVo trueAddress){try{URL urlOne =newURL(RemoveWatermarkConstant.GET_COOKIE_URL);HttpURLConnection connection =(HttpURLConnection) urlOne.openConnection();
            connection.setRequestMethod(RemoveWatermarkConstant.POST);
            connection.setRequestProperty(RemoveWatermarkConstant.USER_AGENT,RemoveWatermarkConstant.USER_AGENT_V);String cookieHeader = connection.getHeaderField(RemoveWatermarkConstant.SET_COOKIE);String[] cookies = cookieHeader.split(RemoveWatermarkConstant.SPLIT);String cookie = cookies[RemoveWatermarkConstant.ZERO];
            connection.disconnect();
            trueAddress.setCookieInfo(cookie);}catch(IOException e){
            log.error(RemoveWatermarkConstant.GET_COOKIE_ERR, e.getMessage());}}}

5、结语

java爬虫限制较多还是建议用python会更便捷,本篇仅供参考,如有问题可浏览探讨。

标签: 爬虫 java selenium

本文转载自: https://blog.csdn.net/weixin_56772904/article/details/135225886
版权归原作者 疆果 所有, 如有侵权,请联系我们删除。

“保姆级爬虫无水印视频大全 最新版java+selenium”的评论:

还没有评论