当前位置：首页 > news >正文

批量抓取图片

news 2025/9/21 12:53:49

本文为个人学习笔记整理，仅供交流参考，非专业教学资料，内容请自行甄别。
通过学习获取的图片仅可用于个人技术研究（如测试下载逻辑、解析代码），不得用于商业用途（如制作产品素材、二次分发），也不能传播涉及版权或隐私的内容。

文章目录

前言
一、批量抓取图片
- 1.1、概述
- 1.2、项目运用

前言

本文主要介绍如何运用第三方图片搜索引擎，实现批量获取图片的功能。本篇中使用bing的实现。

一、批量抓取图片

1.1、概述

使用bing进行图片批量抓取，基础的访问链接：

https://cn.bing.com/images/async?q=xxx

q=是必需参数，用于指定搜索关键词，此外还有一些常见的可选参数：

first：可选参数，用于指定从第几张图片开始返回，用于分页，如first=35表示从第 35 张图片开始返回。
count：可选参数，用于控制返回的搜索结果数量，默认值可能为 35，最大值一般为 150，如count=20表示返回 20 张图片。
cw和ch：可选参数，可能用于指定图片的宽度和高度，如cw=1177&ch=737。
mmasync：可选参数，可能用于控制异步加载的某种机制，一般设置为 1。

在浏览器上访问：
在这里插入图片描述

按下F12进行观察，发现返回的是html格式的文本，还需要再次进行解析。

在Java中，可以通过正则表达式，或者引入第三方的库再次进行解析，这里推荐使用jsoup：

        <!-- HTML 解析：https://jsoup.org/ --><dependency><groupId>org.jsoup</groupId><artifactId>jsoup</artifactId><version>1.15.3</version></dependency>

在项目中通过jsoup的api发送请求，然后对Document实例进行解析，得到搜索结果的html中真正的图片地址，然后就可以保存到本地或者上传到第三方的存储空间中了。
在这里插入图片描述

1.2、项目运用

在项目中的运用，总体设计为，前端页面让用户填写抓取的关键词，数量，以及批次：
在这里插入图片描述
后端使用DTO进行接收：

@Data
public class PictureUploadByBatchRequest {/*** 搜索词*/private String searchText;/*** 抓取数量*/private Integer count = 5;/*** 执行批次*/private Integer batch = 1;
}

后端的设计思路是加入一个图片抓取记录表，主要用于记录某个搜索关键字的当前偏移量，表结构设计，最终图片是上传到腾讯云对象存储中。

create table first.picture_fetch_record
(id            bigint                             not null comment '主键Id'primary key,keyWord       varchar(255)                       null comment '搜索关键词',currentOffset int                                null comment '当前搜索偏移量',userId        bigint                             null comment '操作人id',createTime    datetime default CURRENT_TIMESTAMP not null comment '创建时间',updateTime    datetime default CURRENT_TIMESTAMP not null on update CURRENT_TIMESTAMP comment '更新时间'
)comment '图片抓取记录表';

service层完整实现如下：

    @Overridepublic Integer adminFetchPicture(PictureUploadByBatchRequest pictureUploadByBatchRequest, HttpServletRequest httpServletRequest) {String searchText = pictureUploadByBatchRequest.getSearchText();Integer count = pictureUploadByBatchRequest.getCount();Integer batch = pictureUploadByBatchRequest.getBatch();//        ThrowUtils.throwIf(count > 30, ErrorCode.OPERATION_ERROR, "批量获取图片，不能超过30张");
//        ThrowUtils.throwIf(batch > 6, ErrorCode.OPERATION_ERROR, "批量获取图片，批次不能超过6批");User loginUser = userService.getLoginUser(httpServletRequest);int totalCount = 0;for (Integer i = 0; i < batch; i++) {//首先查询该关键字有无搜索记录PictureFetchRecord pictureFetchRecord= pictureFetchRecordService.lambdaQuery().eq(PictureFetchRecord::getKeyWord, searchText).one();// 要抓取的地址String fetchUrl = String.format("https://cn.bing.com/images/async?q=%s&mmasync=1&first=%s", searchText,ObjUtil.isEmpty(pictureFetchRecord) ? 1 : pictureFetchRecord.getCurrentOffset() + count);//解析documentDocument document;try {document = Jsoup.connect(fetchUrl).get();} catch (IOException e) {log.error("获取页面失败", e);throw new BusinessException(ErrorCode.OPERATION_ERROR, "获取页面失败");}Element div = document.getElementsByClass("dgControl").first();if (ObjUtil.isNull(div)) {throw new BusinessException(ErrorCode.OPERATION_ERROR, "获取元素失败");}Elements imgElementList = div.select("img.mimg");int uploadCount = 0;for (Element imgElement : imgElementList) {String fileUrl = imgElement.attr("src");if (StrUtil.isBlank(fileUrl)) {log.info("当前链接为空，已跳过: {}", fileUrl);continue;}// 处理图片上传地址，防止出现转义问题int questionMarkIndex = fileUrl.indexOf("?");if (questionMarkIndex > -1) {fileUrl = fileUrl.substring(0, questionMarkIndex);}PictureUploadRequest pictureUploadRequest = new PictureUploadRequest();try {pictureUploadRequest.setPicName(searchText + "(" + System.currentTimeMillis() + ")");// 上传图片PictureVO pictureVO = this.uploadPicture(fileUrl, pictureUploadRequest, loginUser);log.info("图片上传成功, id = {}", pictureVO.getId());uploadCount++;totalCount++;} catch (Exception e) {log.error("图片上传失败", e);continue;}if (uploadCount >= count) {break;}}if (ObjUtil.isEmpty(pictureFetchRecord)){PictureFetchRecord record = new PictureFetchRecord();record.setKeyWord(searchText);record.setUserId(loginUser.getId());record.setCurrentOffset(1);pictureFetchRecordService.save(record);}else {pictureFetchRecord.setCurrentOffset(pictureFetchRecord.getCurrentOffset() + count);pictureFetchRecordService.updateById(pictureFetchRecord);}}return totalCount;}