当前位置: 首页 > news >正文

内存泄漏系列专题分析之二十:camx swap内存泄漏实例分析

【关注我,后续持续新增专题博文,谢谢!!!】

上一篇我们讲了内存泄漏系列专题分析之十九: 开机内存占用超标分析一:cameraserver超标1.8M分析

        这一篇我们开始讲 内存泄漏系列专题分析之二十:swap内存泄漏实例分析

目录

一、背景

二、:swap内存泄漏思路

    2.1:首先确认camx hal进程内存占用是否正常

    2.2 :首先需确认camx hal进程内存占用是否正常?是否泄漏?

    2.3 :基于原理来分析swap内存泄漏

    2.4 :分析低内存下kernel dump进程内存分布

    2.5 :导出Camx HAL进程的内存细化情况

    2.6 :此时,12小时之前不增长,后面swap异常增长变得难以理解了

    2.7 :继续压测monkey正式版本,无法复现

    2.8 :基于测试情况对老化版本的猜测

    2.9 :使用老化版本高负载下压测,果然复现

    2.10 :基于上面swap原理分析,我们开始复现并分析内存泄漏

    2.11 :分析Camx HAL fdnode代码

    2.12 :加日志debug

    2.13 :解决问题

    2.14 :总结


一、背景

        之前一个项目出现老化过程中,出现相机等各种app ANR爆了。分析发现camx hal进程swap内存持续增长到4.5G,很明显的内存泄漏。而swap内存并没有用户态分配内存接口,它由kernel内核控制,导致内存泄漏分析思路完全颠覆了,常规直接切入swap内存分配无法进行,此时该如何分析呢?

二、:swap内存泄漏思路

    2.1:首先确认camx hal进程内存占用是否正常

1:我们在这份日志里找到 (4,544,364K in swap),即camx hal进程swap内存已经达到4.5G,而此时相机是退出状态,也说明了这个swap内存并没有释放,即使相机关闭状态,也一直驻留在PSS里。

1739472003612@1739472703954@2025_02_14_02_52_53@SystemLog.zip    4,597,384K: vendor.qti.camera.provider@2.7-service_64 (pid 1634)        (4,544,364K in swap)

    2.2 :首先需确认camx hal进程内存占用是否正常?是否泄漏?

        我们下载老化单台机器所有日志进一步确认是否持续增长,结果下载老化几天的日志,dump出camx hal进程内存不同时间段的情况。

        可以发现老化压测几天几夜的过程中,单台机器的Camx HAL进程swap内存一直是持续增长过程,无论相机是否打开,swap内存就一直没有降低或释放过,因此我们断定swap内存泄漏了。

单台机器的详细日志:相机camx hal进程内存增长情况:
1739266999645@1739326404235@2025_02_12_10_28_48@SystemLog.zip    198,013K: vendor.qti.camera.provider@2.7-service_64 (pid 1634)        (  145,308K in swap)
1739327328830@1739341135895@2025_02_12_14_41_18@SystemLog.zip    894,815K: vendor.qti.camera.provider@2.7-service_64 (pid 1634)        (  841,472K in swap)
1739341385357@1739357045550@2025_02_12_18_45_45@SystemLog.zip    1,501,088K: vendor.qti.camera.provider@2.7-service_64 (pid 1634)        (1,447,940K in swap)
1739368747545@1739369513656@2025_02_12_22_13_35@SystemLog.zip    2,253,697K: vendor.qti.camera.provider@2.7-service_64 (pid 1634)        (2,200,436K in swap)
1739391291701@1739392395773@2025_02_13_04_34_54@SystemLog.zip    3,112,163K: vendor.qti.camera.provider@2.7-service_64 (pid 1634)        (3,059,024K in swap)
1739472003612@1739472703954@2025_02_14_02_52_53@SystemLog.zip    4,597,384K: vendor.qti.camera.provider@2.7-service_64 (pid 1634)        (4,544,364K in swap)

    2.3 :基于原理来分析swap内存泄漏

        这时需要理解swap内存的定义是什么?是否真的内存泄漏了?swap内存的来源是什么?
        swap主要是在内存不够用的时候,将部分内存上的数据交换到swap空间上,以便让系统不会因为内存不够用而导致oom或者更致命的情况出现。当内存使用存在压力的时候,开始触发内存回收行为,就可能会使用swap空间。Swap 把这些不常访问的内存先写到磁盘中,然后释放这些内存,给其他更需要的进程使用。再次访问这些内存时,重新从磁盘读入内存就可以了

        swap内存主要是压缩不活跃的内存,如果内存泄漏了,swap的来源肯定也是持续增长趋势。通常情况下,swap不会去压缩ION内存,更多是压缩malloc分配的ANON等匿名页,也就是进程通过malloc/mmap使用的内存部分。于是我们需要细化观察进程各个内存变化过程。

    2.4 :分析低内存下kernel dump进程内存分布

通常如下:(内存理论知识,后续补充发布)
======= dump_procs
comm             32   uid s f   pid  ppid   oom       vss    anon    file   shmem    swap
vendor.qti.came   0  1047 S    1634     1 -1000   2276752   56001   27834     181  463869
======= dump_procs_dmabuf_info
comm             pid   size
vendor.qti.came   1634 249424

    2.5 :导出Camx HAL进程的内存细化情况

        我们写个脚本导出所有老化过程中这个进程的内存细化情况,如下:
        梳理swap的过程数据,发现swap后期恶化了,过程数据可以说明和相机无关。压测14小时相机swap稳定在150~200M左右,没有增长。大概02-12 08点之后长时间(几个小时)低内存状态,但这个低内存不是相机导致的。swap恶化时间段,相机是没运行的,很可能是长时间低内存导致不断恶化。

        日志详解:

{"HAL_Memory":{"pid":"1634","pss":"106200","swapPss":"272728","total":"378928"}}

表示Camx HAL进程的pid是1634,pss是106M,swap内存272M,合计378M。

kernel_log_20250212_074848.txt:02-12 09:15:23.704673    vendor.qti.came   0  1047 S    1634     1 -1000   1771517   89069   30060     510   79668   //356M 318M

表示AON内存是89069,size是89069 * 4 = 356M,swap内存是79668,size是79668 * 4 = 318M(这里的单位是page,即4K,所以要乘以4)

android_log_20250211_181654.txt:02-11 18:17:58.134540  2419  2704 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"107716","swapPss":"140076","total":"247792"}}, len: 80
android_log_20250212_090446.txt:02-12 09:05:13.836740  2419  2419 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"452426","swapPss":"271648","total":"724074"}}, len: 80
android_log_20250212_090446.txt:02-12 09:05:15.554367  2419 14677 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"111951","swapPss":"271468","total":"383419"}}, len: 80
android_log_20250212_090446.txt:02-12 09:05:18.157295  2419 14677 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"106260","swapPss":"274080","total":"380340"}}, len: 80
android_log_20250212_090446.txt:02-12 09:05:19.174628  2419 16408 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"433823","swapPss":"273476","total":"707299"}}, len: 80
android_log_20250212_090446.txt:02-12 09:05:19.519812  2419 16408 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"476763","swapPss":"273424","total":"750187"}}, len: 80
android_log_20250212_090446.txt:02-12 09:05:20.854754  2419 14677 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"105258","swapPss":"273416","total":"378674"}}, len: 80
android_log_20250212_090446.txt:02-12 09:05:25.353172  2419 16408 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"106388","swapPss":"273416","total":"379804"}}, len: 80
android_log_20250212_090446.txt:02-12 09:05:26.068137  2419 14677 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"459636","swapPss":"272712","total":"732348"}}, len: 80
android_log_20250212_090446.txt:02-12 09:05:26.154126  2419 14677 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"486884","swapPss":"272712","total":"759596"}}, len: 80
android_log_20250212_090446.txt:02-12 09:05:27.256947  2419 14677 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"493341","swapPss":"273156","total":"766497"}}, len: 80
android_log_20250212_090446.txt:02-12 09:05:28.080545  2419 14677 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"530484","swapPss":"272756","total":"803240"}}, len: 80
android_log_20250212_090446.txt:02-12 09:05:28.599953  2419 16408 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"225294","swapPss":"272752","total":"498046"}}, len: 80
android_log_20250212_090446.txt:02-12 09:05:29.195314  2419  2419 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"106200","swapPss":"272728","total":"378928"}}, len: 80kernel_log_20250212_074848.txt:02-12 09:15:23.704673    vendor.qti.came   0  1047 S    1634     1 -1000   1771517   89069   30060     510   79668   //356M 318M
kernel_log_20250212_074848.txt:02-12 09:15:23.719473    vendor.qti.came   1634 534520
kernel_log_20250212_074848.txt:02-12 09:15:23.732906    vendor.qti.came   1634 2056
kernel_log_20250212_074848.txt:02-12 09:15:23.779898    vendor.qti.came  1634  57436
kernel_log_20250212_074848.txt:02-12 09:22:26.605359    vendor.qti.came   0  1047 S    1634     1 -1000   1787105   89807   32791     486   90479	//350M 361M
kernel_log_20250212_074848.txt:02-12 09:22:26.641372    vendor.qti.came   1634 543048
kernel_log_20250212_074848.txt:02-12 09:22:26.676573    vendor.qti.came   1634 2056
kernel_log_20250212_074848.txt:02-12 09:22:26.713047    vendor.qti.came  1634  37016
kernel_log_20250212_074848.txt:02-12 09:23:22.603924    vendor.qti.came   0  1047 S    1634     1 -1000   1696732   87060   26704     460   90752
kernel_log_20250212_074848.txt:02-12 09:23:22.607066    vendor.qti.came   1634 374220
kernel_log_20250212_074848.txt:02-12 09:23:22.613789    vendor.qti.came   1634 2056
kernel_log_20250212_074848.txt:02-12 09:23:22.625893    vendor.qti.came  1634  21600
kernel_log_20250212_074848.txt:02-12 09:27:39.210729    vendor.qti.came   0  1047 S    1634     1 -1000   1785854   88981   31053     486   90044
kernel_log_20250212_074848.txt:02-12 09:27:39.221070    vendor.qti.came   1634 545020
kernel_log_20250212_074848.txt:02-12 09:27:39.238331    vendor.qti.came   1634 2056
kernel_log_20250212_074848.txt:02-12 09:27:39.251598    vendor.qti.came  1634  57436
kernel_log_20250212_074848.txt:02-12 09:27:49.209565    vendor.qti.came   0  1047 R    1634     1 -1000   1970717   58090   28009     695  135891   //232M 543M
kernel_log_20250212_074848.txt:02-12 09:27:49.219898    vendor.qti.came   1634 1209240
kernel_log_20250212_074848.txt:02-12 09:27:49.231472    vendor.qti.came   1634 2056
kernel_log_20250212_074848.txt:02-12 09:27:49.235120    vendor.qti.came  1634  57436
kernel_log_20250212_074848.txt:02-12 09:28:46.534259    vendor.qti.came   0  1047 S    1634     1 -1000   1799432   89018   34058     466   90156
kernel_log_20250212_074848.txt:02-12 09:28:46.550716    vendor.qti.came   1634 477708
kernel_log_20250212_074848.txt:02-12 09:28:46.604629    vendor.qti.came   1634 2056
kernel_log_20250212_074848.txt:02-12 09:28:46.613610    vendor.qti.came  1634  57436
kernel_log_20250212_074848.txt:02-12 09:30:23.664100    vendor.qti.came   0  1047 S    1634     1 -1000   1740557   87743   30190     464   90309
kernel_log_20250212_074848.txt:02-12 09:30:23.668068    vendor.qti.came   1634 463936
kernel_log_20250212_074848.txt:02-12 09:30:23.671577    vendor.qti.came   1634 2056
kernel_log_20250212_074848.txt:02-12 09:30:23.687504    vendor.qti.came  1634  33752
kernel_log_20250212_074848.txt:02-12 09:30:38.004229    vendor.qti.came   0  1047 S    1634     1 -1000   1776584   88696   30085     469   90435
kernel_log_20250212_074848.txt:02-12 09:30:38.008920    vendor.qti.came   1634 488416
kernel_log_20250212_074848.txt:02-12 09:30:38.012524    vendor.qti.came   1634 2056
kernel_log_20250212_074848.txt:02-12 09:30:38.014416    vendor.qti.came  1634  10756
kernel_log_20250212_074848.txt:02-12 09:31:30.068130    vendor.qti.came   0  1047 S    1634     1 -1000   1799189   88731   32692     469   90584
kernel_log_20250212_074848.txt:02-12 09:31:30.073191    vendor.qti.came   1634 528200
kernel_log_20250212_074848.txt:02-12 09:31:30.078858    vendor.qti.came   1634 2056
kernel_log_20250212_074848.txt:02-12 09:31:30.081252    vendor.qti.came  1634  47524
kernel_log_20250212_093326.txt:02-12 09:34:15.483537    vendor.qti.came   0  1047 R    1634     1 -1000   1696184   61990   29558     443   90922  //247m, 363M
kernel_log_20250212_093326.txt:02-12 09:34:15.518644    vendor.qti.came   1634 259644
kernel_log_20250212_093326.txt:02-12 09:34:15.569083    vendor.qti.came   1634 2056
kernel_log_20250212_093326.txt:02-12 09:34:15.576392    vendor.qti.came  1634  50804android_log_20250212_093815.txt:02-12 09:38:20.401365  2419 16428 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"53561","swapPss":"441428","total":"494989"}}, len: 79
android_log_20250212_093815.txt:02-12 09:38:21.495971  2419 16428 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"419354","swapPss":"404776","total":"824130"}}, len: 80
android_log_20250212_093815.txt:02-12 09:38:22.117979  2419 16428 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"446997","swapPss":"402052","total":"849049"}}, len: 80
kernel_log_20250212_093326.txt:02-12 09:38:22.903274  5174  5174 I [57697.813535]osvelte: vendor.qti.came   0  1047 S    1634     1 -1000   1795295   89358   29790     526  100309
kernel_log_20250212_093326.txt:02-12 09:38:22.924792  5174  5174 I [57697.835053]osvelte: vendor.qti.came   1634 558480
kernel_log_20250212_093326.txt:02-12 09:38:22.989238  5174  5174 I [57697.899499]osvelte: vendor.qti.came   1634 2056
kernel_log_20250212_093326.txt:02-12 09:38:23.024083  5174  5174 I [57697.934344]osvelte: vendor.qti.came  1634  57436
android_log_20250212_093815.txt:02-12 09:38:23.974748  2419 16428 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"436127","swapPss":"400380","total":"836507"}}, len: 80
android_log_20250212_093815.txt:02-12 09:38:25.048338  2419  2419 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"131587","swapPss":"399708","total":"531295"}}, len: 80
android_log_20250212_105715.txt:02-12 10:57:45.990464  2419 16408 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"108438","swapPss":"524372","total":"632810"}}, len: 80
android_log_20250212_105715.txt:02-12 10:57:46.877925  2419 28204 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"413473","swapPss":"522412","total":"935885"}}, len: 80
android_log_20250212_105715.txt:02-12 10:57:47.067209  2419 28204 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"423721","swapPss":"522400","total":"946121"}}, len: 80
android_log_20250212_105715.txt:02-12 10:57:48.851122  2419 28204 I CameraServiceExtImpl.cpp: 969 onTransact() onTransact, str: {"HAL_Memory":{"pid":"1634","pss":"453347","swapPss":"522660","total":"976007"}}, len: 80

    2.6 :此时,12小时之前不增长,后面swap异常增长变得难以理解了

        于是测试和正面分析同步进行。经过压测大量测试场景,均未发现相机进程内存泄漏。一定程度排除内存泄漏的可能,而swap的写入写出不是用户态控制,由kernel内核态控制。分析内核日志后也无特别异常。

    2.7 :继续压测monkey正式版本,无法复现

压测monkey正式版本发现hal进程crash,测试完,相机退出,hal进程内存也非常正常,swap内存也只有几百M。

    2.8 :基于测试情况对老化版本的猜测

开始怀疑老化版本在高负载情况下才出现,正式版本一般情况下不会出现。

    2.9 :使用老化版本高负载下压测,果然复现

于是使用老化版本高负载下压测并dump内存果然出现heap增长,但不是每次都增长,而是概率性增长,而PSS增长到2G左右,swap内存开始增长。正式版本却很难复现。

    2.10 :基于上面swap原理分析,我们开始复现并分析内存泄漏

开启malloc debug进行复测,发现FDnode代码中VendorFDDeinit没有执行cv_face_destroy_tracker导致出现泄漏。其中malloc debug分析方式参考:内存泄漏系列专题分析之五:使用malloc_debug定位C/C++ native heap内存泄露

VOID FDManagerNode::VendorFDLoad(senseTime_lib_struct *p_lib)
{int rc = 0;struct timeval mStartTime, mEndTime;gettimeofday(&mStartTime, NULL);……………………………………ret = g_senseTime_lib.cv_face_create_tracker(&m_SenseTimeEngine.cv_tracker_handle, NULL, flag);if ((ret != CV_OK) || (m_SenseTimeEngine.cv_tracker_handle == NULL)){CAMX_LOG_ERROR(CamxLogGroupFD, "WIDE: cv_face_create_tracker error: %d", ret);return ret;}
}
VOID FDManagerNode::VendorFDDeinit()
{struct timeval mStartTime, mEndTime;gettimeofday(&mStartTime, NULL);if (m_SenseTimeEngine.cv_tracker_handle){g_senseTime_lib.cv_face_destroy_tracker(m_SenseTimeEngine.cv_tracker_handle);m_SenseTimeEngine.cv_tracker_handle = NULL;}CamxAtomicDecU(&m_engineCounter);VendorFDModelDeinit();if (m_attrFeature != NULL){free(m_attrFeature);m_attrFeature = NULL;}if (m_attrEmotion != NULL){free(m_attrEmotion);m_attrEmotion = NULL;}gettimeofday(&mEndTime, NULL);CAMX_LOG_INFO(CamxLogGroupFD, "FD deinit spends %lld ms (pending engine: %d)",((mEndTime.tv_sec - mStartTime.tv_sec) * 1000 + (mEndTime.tv_usec - mStartTime.tv_usec)/1000), m_engineCounter);
}

  .

    2.11 :分析Camx HAL fdnode代码

经过代码分析,FDnode申请和释放是对应关系,并没有特别异常。

申请内存:在VendorFDLoad里调用cv_face_create_tracker

释放内存:在VendorFDDeinit里调用cv_face_destroy_tracker

从代码上看,并没有内存泄漏。

  .

    2.12 :加日志debug

        于是加日志debug,发现FDnode是创建多线程运行,且多个锁操作和原子操作。而在性能极差的情况下,线程同步会出现问题,导致内存handler在构造函数创建线程时已经置空,导致另外一个线程在析构函数未释放。代码如下:

void FDManagerNode::OPLUSFDManagerNode()
{……//多线程这里全局变量置NULL了m_SenseTimeEngine.cv_tracker_handle = NULL;……
}VOID FDManagerNode::VendorFDDeinit()
{//导致这里无法进入是否内存if (m_SenseTimeEngine.cv_tracker_handle){g_senseTime_lib.cv_face_destroy_tracker(m_SenseTimeEngine.cv_tracker_handle);m_SenseTimeEngine.cv_tracker_handle = NULL;}
}

    2.13 :解决问题

解决线程同步异常问题就解决该问题。

    2.14 :总结

终于有了最终结论,性能极差下,非常低概率下,线程同步异常导致内存泄漏。

【关注我,后续持续新增专题博文,谢谢!!!】

下一篇讲解

相关文章:

  • Jenkins + Docker + Kubernetes(JKD)自动化部署全链路实践
  • 基于OpenCV的图像增强技术:直方图均衡化与自适应直方图均衡化
  • 零基础设计模式——行为型模式 - 备忘录模式
  • LVS 负载均衡详解:四层转发原理与三种经典模式全面解析
  • eureka如何绕过 LVS 的虚拟 IP(VIP),直接注册服务实例的本机真实 IP
  • 我们来学mysql -- 8.4版本记录慢查询
  • Spring MVC扩展与SSM框架整合
  • 传统机器学习与大模型 + Prompt 的对比示例
  • 计算机系统概述(5)
  • 使用Docker申请Let‘s Encrypt证书
  • 谈文件系统
  • AI Agent核心技术深度解析:Function Calling与ReAct对比报告
  • vue3笔记(1)自用
  • 开源、免费、美观的 Vue 后台管理系统模板
  • 当简约美学融入小程序 UI 设计:开启高效交互新篇
  • 将vmware下旧的黑群晖nas迁移到别的服务器,并对硬盘进行扩容
  • 深度学习:PyTorch简介
  • Qt绘制温度计源码分享
  • Vulkan学习笔记1—环境搭建
  • jsoncpp ubuntu编译问题
  • wordpress调用指定文章id/长沙优化网站推广
  • 做鸭服务的网站或群/产品软文怎么写
  • 十大卖衣服网站/扬州百度seo
  • 南宁广告公司网站建设/品牌营销案例
  • 机关网站建设存在的问题/站长之家whois查询
  • 自己做网站卖货多少钱/快速排名seo