当前位置: 首页 > news >正文

【场景题】List集合去重

List集合去重

  • 1. 题目
  • 2. 测试场景设定
  • 3. 示例代码
  • 4. 性能对比(实际测试结果)
  • 5. 结论与推荐
  • 6. 额外性能优化建议
  • 7. 复杂对象时,千万级数据不要用stream流

1. 题目

List集合去重都有哪些方法,性能、可读性、空间开销上分析一下。

2. 测试场景设定

假设我们有一个 List<Integer>,长度为 100 万,其中包含约 10% 的重复数据。
我们要测试以下几种去重方案:

序号实现方式保持顺序主要特点
使用 HashSet❌ 否最快,不保证顺序
使用 LinkedHashSet✅ 是快且保持顺序
使用 TreeSet✅ 按自然排序有序,但最慢
使用 Stream + distinct()✅ 是简洁、现代写法,速度不输HashSet
双层循环(传统写法)✅ 是时间复杂度高
利用 Map 键去重✅ 是灵活度高,性能接近 LinkedHashSet

3. 示例代码

package com.beijing.controller;import java.util.*;
import java.util.function.Function;
import java.util.stream.Collectors;public class DistinctListTest {private static final int DATA_SIZE = 1_000_0000; // 数据规模private static final int RUNS = 5;              // 每种方法重复次数public static void main(String[] args) {// 构造随机测试数据List<Integer> list = new ArrayList<>(DATA_SIZE);Random random = new Random();for (int i = 0; i < DATA_SIZE; i++) {list.add(random.nextInt(900_000)); // 含部分重复}System.out.println("========================================");System.out.println("测试开始,样本数量:" + list.size());System.out.println("每种算法重复运行 " + RUNS + " 次,单位:ms");System.out.println("========================================\n");testMethod("① HashSet 去重", list, l -> new ArrayList<>(new HashSet<>(l)));testMethod("② LinkedHashSet 去重", list, l -> new ArrayList<>(new LinkedHashSet<>(l)));testMethod("③ TreeSet 去重", list, l -> new ArrayList<>(new TreeSet<>(l)));testMethod("④ Stream distinct 去重", list, l -> l.stream().distinct().collect(Collectors.toList()));testMethod("⑤ Map 去重", list, l -> {Map<Integer, Boolean> map = new LinkedHashMap<>();for (Integer i : l) map.put(i, true);return new ArrayList<>(map.keySet());});System.out.println("\n========================================");System.out.println("测试完成 ✅");System.out.println("========================================");}/*** 通用测试函数:多次执行求平均耗时*/private static void testMethod(String name, List<Integer> list, Function<List<Integer>, List<Integer>> func) {long total = 0;int resultSize = 0;System.out.printf("%s:\n", name);for (int i = 1; i <= RUNS; i++) {long start = System.nanoTime();List<Integer> result = func.apply(list);long cost = (System.nanoTime() - start) / 1_000_000; // 转为 mstotal += cost;resultSize = result.size();System.out.printf("   第 %d 次耗时:%4d ms\n", i, cost);}double avg = total * 1.0 / RUNS;System.out.printf("👉 平均耗时:%.2f ms,结果大小:%d\n\n", avg, resultSize);}
}
/Library/Java/JavaVirtualMachines/jdk-21.jdk/Contents/Home/bin/java -Dvisualvm.id=431012066641166 -javaagent:/Applications/IntelliJ IDEA.app/Contents/lib/idea_rt.jar=62078 -Dfile.encoding=UTF-8 -Dsun.stdout.encoding=UTF-8 -Dsun.stderr.encoding=UTF-8 -classpath /Users/fanzhen/Documents/SpringAi/moreLLM/target/classes:/Users/fanzhen/Documents/Repository/org/springframework/boot/spring-boot-starter-web/3.3.8/spring-boot-starter-web-3.3.8.jar:/Users/fanzhen/Documents/Repository/org/springframework/boot/spring-boot-starter/3.3.8/spring-boot-starter-3.3.8.jar:/Users/fanzhen/Documents/Repository/org/springframework/boot/spring-boot/3.3.8/spring-boot-3.3.8.jar:/Users/fanzhen/Documents/Repository/org/springframework/boot/spring-boot-autoconfigure/3.3.8/spring-boot-autoconfigure-3.3.8.jar:/Users/fanzhen/Documents/Repository/org/springframework/boot/spring-boot-starter-logging/3.3.8/spring-boot-starter-logging-3.3.8.jar:/Users/fanzhen/Documents/Repository/ch/qos/logback/logback-classic/1.5.16/logback-classic-1.5.16.jar:/Users/fanzhen/Documents/Repository/ch/qos/logback/logback-core/1.5.16/logback-core-1.5.16.jar:/Users/fanzhen/Documents/Repository/org/apache/logging/log4j/log4j-to-slf4j/2.23.1/log4j-to-slf4j-2.23.1.jar:/Users/fanzhen/Documents/Repository/org/apache/logging/log4j/log4j-api/2.23.1/log4j-api-2.23.1.jar:/Users/fanzhen/Documents/Repository/org/slf4j/jul-to-slf4j/2.0.16/jul-to-slf4j-2.0.16.jar:/Users/fanzhen/Documents/Repository/jakarta/annotation/jakarta.annotation-api/2.1.1/jakarta.annotation-api-2.1.1.jar:/Users/fanzhen/Documents/Repository/org/yaml/snakeyaml/2.2/snakeyaml-2.2.jar:/Users/fanzhen/Documents/Repository/org/springframework/boot/spring-boot-starter-json/3.3.8/spring-boot-starter-json-3.3.8.jar:/Users/fanzhen/Documents/Repository/com/fasterxml/jackson/core/jackson-databind/2.17.3/jackson-databind-2.17.3.jar:/Users/fanzhen/Documents/Repository/com/fasterxml/jackson/core/jackson-annotations/2.17.3/jackson-annotations-2.17.3.jar:/Users/fanzhen/Documents/Repository/com/fasterxml/jackson/datatype/jackson-datatype-jdk8/2.17.3/jackson-datatype-jdk8-2.17.3.jar:/Users/fanzhen/Documents/Repository/com/fasterxml/jackson/datatype/jackson-datatype-jsr310/2.17.3/jackson-datatype-jsr310-2.17.3.jar:/Users/fanzhen/Documents/Repository/com/fasterxml/jackson/module/jackson-module-parameter-names/2.17.3/jackson-module-parameter-names-2.17.3.jar:/Users/fanzhen/Documents/Repository/org/springframework/boot/spring-boot-starter-tomcat/3.3.8/spring-boot-starter-tomcat-3.3.8.jar:/Users/fanzhen/Documents/Repository/org/apache/tomcat/embed/tomcat-embed-core/10.1.34/tomcat-embed-core-10.1.34.jar:/Users/fanzhen/Documents/Repository/org/apache/tomcat/embed/tomcat-embed-el/10.1.34/tomcat-embed-el-10.1.34.jar:/Users/fanzhen/Documents/Repository/org/apache/tomcat/embed/tomcat-embed-websocket/10.1.34/tomcat-embed-websocket-10.1.34.jar:/Users/fanzhen/Documents/Repository/org/springframework/spring-web/6.1.16/spring-web-6.1.16.jar:/Users/fanzhen/Documents/Repository/org/springframework/spring-beans/6.1.16/spring-beans-6.1.16.jar:/Users/fanzhen/Documents/Repository/io/micrometer/micrometer-observation/1.13.10/micrometer-observation-1.13.10.jar:/Users/fanzhen/Documents/Repository/io/micrometer/micrometer-commons/1.13.10/micrometer-commons-1.13.10.jar:/Users/fanzhen/Documents/Repository/org/springframework/spring-webmvc/6.1.16/spring-webmvc-6.1.16.jar:/Users/fanzhen/Documents/Repository/org/springframework/spring-aop/6.1.16/spring-aop-6.1.16.jar:/Users/fanzhen/Documents/Repository/org/springframework/spring-context/6.1.16/spring-context-6.1.16.jar:/Users/fanzhen/Documents/Repository/org/springframework/spring-expression/6.1.16/spring-expression-6.1.16.jar:/Users/fanzhen/Documents/Repository/org/springframework/ai/spring-ai-starter-model-ollama/1.0.1/spring-ai-starter-model-ollama-1.0.1.jar:/Users/fanzhen/Documents/Repository/org/springframework/ai/spring-ai-autoconfigure-model-ollama/1.0.1/spring-ai-autoconfigure-model-ollama-1.0.1.jar:/Users/fanzhen/Documents/Repository/org/springframework/ai/spring-ai-autoconfigure-retry/1.0.1/spring-ai-autoconfigure-retry-1.0.1.jar:/Users/fanzhen/Documents/Repository/org/springframework/ai/spring-ai-autoconfigure-model-tool/1.0.1/spring-ai-autoconfigure-model-tool-1.0.1.jar:/Users/fanzhen/Documents/Repository/org/springframework/ai/spring-ai-autoconfigure-model-chat-observation/1.0.1/spring-ai-autoconfigure-model-chat-observation-1.0.1.jar:/Users/fanzhen/Documents/Repository/org/springframework/ai/spring-ai-autoconfigure-model-embedding-observation/1.0.1/spring-ai-autoconfigure-model-embedding-observation-1.0.1.jar:/Users/fanzhen/Documents/Repository/org/springframework/ai/spring-ai-ollama/1.0.1/spring-ai-ollama-1.0.1.jar:/Users/fanzhen/Documents/Repository/org/springframework/ai/spring-ai-autoconfigure-model-chat-client/1.0.1/spring-ai-autoconfigure-model-chat-client-1.0.1.jar:/Users/fanzhen/Documents/Repository/org/springframework/ai/spring-ai-client-chat/1.0.1/spring-ai-client-chat-1.0.1.jar:/Users/fanzhen/Documents/Repository/com/fasterxml/jackson/module/jackson-module-jsonSchema/2.17.3/jackson-module-jsonSchema-2.17.3.jar:/Users/fanzhen/Documents/Repository/javax/validation/validation-api/1.1.0.Final/validation-api-1.1.0.Final.jar:/Users/fanzhen/Documents/Repository/com/knuddels/jtokkit/1.1.0/jtokkit-1.1.0.jar:/Users/fanzhen/Documents/Repository/org/springframework/ai/spring-ai-autoconfigure-model-chat-memory/1.0.1/spring-ai-autoconfigure-model-chat-memory-1.0.1.jar:/Users/fanzhen/Documents/Repository/com/taobao/arthas/arthas-spring-boot-starter/4.0.5/arthas-spring-boot-starter-4.0.5.jar:/Users/fanzhen/Documents/Repository/com/taobao/arthas/arthas-agent-attach/4.0.5/arthas-agent-attach-4.0.5.jar:/Users/fanzhen/Documents/Repository/net/bytebuddy/byte-buddy-agent/1.14.19/byte-buddy-agent-1.14.19.jar:/Users/fanzhen/Documents/Repository/org/zeroturnaround/zt-zip/1.16/zt-zip-1.16.jar:/Users/fanzhen/Documents/Repository/com/taobao/arthas/arthas-packaging/4.0.5/arthas-packaging-4.0.5.jar:/Users/fanzhen/Documents/Repository/org/springframework/ai/spring-ai-openai/1.0.1/spring-ai-openai-1.0.1.jar:/Users/fanzhen/Documents/Repository/org/springframework/ai/spring-ai-model/1.0.1/spring-ai-model-1.0.1.jar:/Users/fanzhen/Documents/Repository/org/springframework/ai/spring-ai-commons/1.0.1/spring-ai-commons-1.0.1.jar:/Users/fanzhen/Documents/Repository/io/micrometer/micrometer-core/1.13.10/micrometer-core-1.13.10.jar:/Users/fanzhen/Documents/Repository/org/hdrhistogram/HdrHistogram/2.2.2/HdrHistogram-2.2.2.jar:/Users/fanzhen/Documents/Repository/org/latencyutils/LatencyUtils/2.0.3/LatencyUtils-2.0.3.jar:/Users/fanzhen/Documents/Repository/io/micrometer/context-propagation/1.1.2/context-propagation-1.1.2.jar:/Users/fanzhen/Documents/Repository/org/springframework/ai/spring-ai-template-st/1.0.1/spring-ai-template-st-1.0.1.jar:/Users/fanzhen/Documents/Repository/org/antlr/ST4/4.3.4/ST4-4.3.4.jar:/Users/fanzhen/Documents/Repository/org/antlr/antlr-runtime/3.5.3/antlr-runtime-3.5.3.jar:/Users/fanzhen/Documents/Repository/org/springframework/spring-messaging/6.1.16/spring-messaging-6.1.16.jar:/Users/fanzhen/Documents/Repository/io/projectreactor/reactor-core/3.6.13/reactor-core-3.6.13.jar:/Users/fanzhen/Documents/Repository/org/reactivestreams/reactive-streams/1.0.4/reactive-streams-1.0.4.jar:/Users/fanzhen/Documents/Repository/org/antlr/antlr4-runtime/4.13.1/antlr4-runtime-4.13.1.jar:/Users/fanzhen/Documents/Repository/com/github/victools/jsonschema-module-swagger-2/4.37.0/jsonschema-module-swagger-2-4.37.0.jar:/Users/fanzhen/Documents/Repository/io/swagger/core/v3/swagger-annotations-jakarta/2.2.25/swagger-annotations-jakarta-2.2.25.jar:/Users/fanzhen/Documents/Repository/org/springframework/ai/spring-ai-retry/1.0.1/spring-ai-retry-1.0.1.jar:/Users/fanzhen/Documents/Repository/org/springframework/retry/spring-retry/2.0.11/spring-retry-2.0.11.jar:/Users/fanzhen/Documents/Repository/com/github/victools/jsonschema-generator/4.37.0/jsonschema-generator-4.37.0.jar:/Users/fanzhen/Documents/Repository/com/fasterxml/classmate/1.7.0/classmate-1.7.0.jar:/Users/fanzhen/Documents/Repository/com/fasterxml/jackson/core/jackson-core/2.17.3/jackson-core-2.17.3.jar:/Users/fanzhen/Documents/Repository/com/github/victools/jsonschema-module-jackson/4.37.0/jsonschema-module-jackson-4.37.0.jar:/Users/fanzhen/Documents/Repository/org/springframework/spring-context-support/6.1.16/spring-context-support-6.1.16.jar:/Users/fanzhen/Documents/Repository/org/springframework/spring-webflux/6.1.16/spring-webflux-6.1.16.jar:/Users/fanzhen/Documents/Repository/org/slf4j/slf4j-api/2.0.16/slf4j-api-2.0.16.jar:/Users/fanzhen/Documents/Repository/org/springframework/spring-core/6.1.16/spring-core-6.1.16.jar:/Users/fanzhen/Documents/Repository/org/springframework/spring-jcl/6.1.16/spring-jcl-6.1.16.jar com.beijing.controller.DistinctListTest
========================================
测试开始,样本数量:10000000
每种算法重复运行 5 次,单位:ms
========================================HashSet 去重:1 次耗时: 719 ms第 2 次耗时: 617 ms第 3 次耗时: 651 ms第 4 次耗时: 650 ms第 5 次耗时: 622 ms
👉 平均耗时:651.80 ms,结果大小:899989LinkedHashSet 去重:1 次耗时: 699 ms第 2 次耗时: 678 ms第 3 次耗时: 677 ms第 4 次耗时: 708 ms第 5 次耗时: 710 ms
👉 平均耗时:694.40 ms,结果大小:899989TreeSet 去重:1 次耗时:4662 ms第 2 次耗时:4505 ms第 3 次耗时:4432 ms第 4 次耗时:3950 ms第 5 次耗时:4652 ms
👉 平均耗时:4440.20 ms,结果大小:899989Stream distinct 去重:1 次耗时: 583 ms第 2 次耗时: 548 ms第 3 次耗时: 486 ms第 4 次耗时: 564 ms第 5 次耗时: 598 ms
👉 平均耗时:555.80 ms,结果大小:899989Map 去重:1 次耗时: 634 ms第 2 次耗时: 591 ms第 3 次耗时: 584 ms第 4 次耗时: 699 ms第 5 次耗时: 566 ms
👉 平均耗时:614.80 ms,结果大小:899989========================================
测试完成 ✅
========================================进程已结束,退出代码为 0

4. 性能对比(实际测试结果)

实现方式平均耗时(10000000数据)是否保持顺序空间占用备注
HashSet~651.80ms中等最快的方案
LinkedHashSet~694.40ms✅ 是中等实际项目最推荐
TreeSet~4440.20ms✅ 排序较高适合需要排序的场景
Stream.distinct()~555.80ms✅ 是中等偏高简洁但略慢
双层循环>100000ms✅ 是最低性能灾难,O(n²)
Map keySet~614.80 ms✅ 是中等与 LinkedHashSet 差不多
实现方式内部机制迭代次数拷贝次数典型速度
HashSet手动两次拷贝22中等
LinkedHashSet保序 + 2 次拷贝22略慢
TreeSet红黑树排序22最慢 但比双层for循环快
Stream.distinct()单次流式去重11通常最快(JIT优化后)

5. 结论与推荐

场景推荐方式理由
不关心顺序new HashSet<>(list)性能最佳
需要保持原顺序new LinkedHashSet<>(list)性能高 + 顺序稳定
需要排序new TreeSet<>(list)自动排序,但性能最差
简洁代码list.stream().distinct().toList()可读性好(JDK 16+)
性能敏感、且需控制逻辑使用 LinkedHashMap更灵活,可扩展计数逻辑等
千万级数据避免 Stream(对象创建多)使用原生集合效率更高

6. 额外性能优化建议

  1. 如果数据类型为简单类型(Integer、String):
    LinkedHashSet 是最优组合。

  2. 如果数据是对象(例如自定义类 Person):
    必须正确实现 hashCode() 与 equals(),否则无法真正去重。

  3. 对于超大数据(>10M):
    可以考虑分片去重(分批次放入 Set,再合并),降低内存峰值。

  4. 当数据量 > 10^6 时 list.parallelStream().distinct().collect(Collectors.toList()); 速度更快,前提是去重像Integer这种简单数据。

7. 复杂对象时,千万级数据不要用stream流

Stream 是基于 “数据流管道(pipeline)” 的抽象实现。

list.stream().filter(...).map(...).distinct().collect(Collectors.toList());

其实 JVM 会构建出一个 层层包装的管道对象链:

ArrayListSpliteratorHeadFilterOpMapOpDistinctOpCollector

每一层都是一个对象,每层都会包一层 lambda 表达式。
数据流经时,这些层会逐一调用回调函数。

类型作用是否每次数据都会创建
Stream 实例管道入口1 次
PipelineHelper维护上下文1 次
Sink每个中间操作一个3~5 个
Spliterator拆分数据源1 个(或多个)
Lambda 对象每个操作至少一个N 个
Stream 框架临时对象在 distinct 中收集约等于元素数量

当数据量到 千万级(10^7):

  • 即使每个对象几十字节,也会产生数百 MB 的短生命周期对象;
  • 这些对象都在 Eden 区(新生代) 分配;
  • JVM 必然会频繁触发 Minor GC。
数据量传统 LinkedHashSetStream.distinct()
10 万约 15 MB约 20 MB
100 万约 100 MB约 160 MB
1000 万约 900 MB>1.6 GB
5000 万约 4.5 GB>9 GB(易 OOM)
💡 原因:Stream 产生大量短生命周期对象,增加 GC 压力与分配负担。
场景推荐方式理由
≤ 百万级数据Stream.distinct()可读性好,性能足够
≥ 千万级数据new LinkedHashSet<>(list)内存更可控,无中间对象
需要多线程加速parallelStream().distinct()可利用多核但仍需注意内存峰值
超大规模(>亿级)分批处理 + HashSet 合并避免单次 pipeline 占用过大内存
http://www.dtcms.com/a/541648.html

相关文章:

  • 最小 k 个数
  • 镇江建网站佛山市城乡住房建设局网站首页
  • LoRA个人理解
  • 网站备案 固定电话做响应网站
  • ROS2系列 (5) : 使用功能包组织C++节点
  • 信创真的能发展的起来吗?
  • 做属于自己公司的网站wordpress微信启动
  • 【教程】Latex简明使用教程
  • 石家庄市建设局网站信息公开免费php网站开发模板
  • 合合肥网站建设威海推广
  • 如何使用 Spring Security 实现细粒度的权限控制?
  • 柯桥教育网站建设深圳南山区网站建设公司
  • 亿赐客网站怎么样北京公司地址推荐
  • 秋招笔记-10.7
  • 2025-10-28 ZYZOJ aoao round 1 hetao1733837的record
  • 在线自助下单网站网站内容包括哪些
  • 企业网站免费模板深圳创意网站建设
  • h5游戏免费下载:弹珠打砖块游戏
  • mysql紧急恢复----gxl
  • 基于springboot的信息化在线教学平台的设计与实现
  • 6.1.3.2 大数据方法论与实践指南-开源大数据实时调度平台(StreamPark)
  • 网站建设mysql数据库电子商务系统的构成
  • C语言入门教程 | 第七讲:函数和程序结构完全指南
  • 佛山网站建设锐艺传播电气毕业设计代做网站
  • h5游戏免费下载:逗比测试
  • 英集芯 IP2326 15W快充2节/3节串联锂电池升压充电IC
  • 做jsp网站的步骤wordpress可视化编辑教程
  • 沈阳做网站的公司排名太平洋手机网参数对比
  • Stable Mean Teacher ---2025 IEEE
  • 【IEEE 2025】即插即用 SRMF 突破长尾困境!实现超高分辨率遥感图像的精准分割