当前位置: 首页 > news >正文

记录一次海思Hi3798MV200-android7.0平台开机卡在第一张图无法进入系统问题分析解决过程

最近在处理客户返修机器时,遇到机器开机一直卡在开机logo无法进入系统问题,但不是每次开机都这样,发生故障概率大概在30%,之前虽然有遇到过,但概率极低,重启后就好了,所以一直没放在心上,刚好借这次机会深入分析一下。

由于卡在开机logo画面,还未进入系统,所以只能通过串口抓取日志如下:

healthd: battery l=100 v=0 t=42.4 h=2 st=2 chg=a
INFO: rcu_sched self-detected stall on CPUINFO: rcu_sched detected stalls on CPUs/tasks: { 1} (detected by 0, t=84007 jiffies, g=703, c=702, q=367)
Task dump for CPU 1:
kworker/1:1     R running      0   595      2 0x00000002
Workqueue: HIFB_WorkQueque HIFB_WBC_StartWorkQueue{ 1}  (t=84031 jiffies g=703 c=702 q=367)
Task dump for CPU 1:
kworker/1:1     R running      0   595      2 0x00000002
Workqueue: HIFB_WorkQueque HIFB_WBC_StartWorkQueue
[<c00188f8>] (unwind_backtrace) from [<c0013d44>] (show_stack+0x20/0x24)
[<c0013d44>] (show_stack) from [<c004f83c>] (sched_show_task+0xa8/0xf4)
[<c004f83c>] (sched_show_task) from [<c0052438>] (dump_cpu_task+0x3c/0x4c)
[<c0052438>] (dump_cpu_task) from [<c0070128>] (rcu_dump_cpu_stacks+0xa4/0xd8)
[<c0070128>] (rcu_dump_cpu_stacks) from [<c0073628>] (rcu_check_callbacks+0x428/0x764)
[<c0073628>] (rcu_check_callbacks) from [<c00783a8>] (update_process_times+0x50/0x70)
[<c00783a8>] (update_process_times) from [<c00890d0>] (tick_sched_handle+0x58/0x64)
[<c00890d0>] (tick_sched_handle) from [<c0089130>] (tick_sched_timer+0x54/0x9c)
[<c0089130>] (tick_sched_timer) from [<c0078e3c>] (__run_hrtimer+0xdc/0x1ac)
[<c0078e3c>] (__run_hrtimer) from [<c007970c>] (hrtimer_interrupt+0x12c/0x310)
[<c007970c>] (hrtimer_interrupt) from [<c05d51d0>] (arch_timer_handler_phys+0x40/0x48)
[<c05d51d0>] (arch_timer_handler_phys) from [<c006c930>] (handle_percpu_devid_irq+0xb4/0x11c)
[<c006c930>] (handle_percpu_devid_irq) from [<c00688d0>] (__handle_domain_irq+0x8c/0xe4)
[<c00688d0>] (__handle_domain_irq) from [<c00086a8>] (gic_handle_irq+0x30/0x6c)
[<c00086a8>] (gic_handle_irq) from [<c00148c0>] (__irq_svc+0x40/0x54)
Exception stack(0xddc7dce0 to 0xddc7dd28)
dce0: 00000003 debe2f00 00000003 00000001 c1266a78 debcbbc4 debcbbc0 c12673a0
dd00: 00000004 c0021d60 00000000 ddc7dd5c 00000003 ddc7dd28 c008db64 c008db88
dd20: 200f0013 ffffffff
[<c00148c0>] (__irq_svc) from [<c008db88>] (smp_call_function_many+0x214/0x2a0)
[<c008db88>] (smp_call_function_many) from [<c008dc88>] (on_each_cpu+0x38/0x58)
[<c008dc88>] (on_each_cpu) from [<c064c6f0>] (__dma_clear_buffer+0x124/0x150)
[<c064c6f0>] (__dma_clear_buffer) from [<c064d7c4>] (hil_mmb_alloc+0x108/0x2e8)
[<c064d7c4>] (hil_mmb_alloc) from [<c064a458>] (new_mmb+0x24/0x44)
[<c064a458>] (new_mmb) from [<c0649554>] (HI_DRV_SMMU_AllocAndMap+0x3c/0x110)
[<c0649554>] (HI_DRV_SMMU_AllocAndMap) from [<c0780fe0>] (OPTM_AllocAndMap+0xcc/0xf8)
[<c0780fe0>] (OPTM_AllocAndMap) from [<c07874d4>] (HIFB_WBC_InitLayer+0x2dc/0x40c)
[<c07874d4>] (HIFB_WBC_InitLayer) from [<c0787620>] (HIFB_WBC_StartWorkQueue+0x1c/0x20)
[<c0787620>] (HIFB_WBC_StartWorkQueue) from [<c003fecc>] (process_one_work+0x194/0x3ac)
[<c003fecc>] (process_one_work) from [<c0040128>] (worker_thread+0x44/0x4cc)
[<c0040128>] (worker_thread) from [<c00451b0>] (kthread+0xd8/0xf4)
[<c00451b0>] (kthread) from [<c000f960>] (ret_from_fork+0x14/0x20)
init: countError is 1
INFO: rcu_sched self-detected stall on CPUINFO: rcu_sched detected stalls on CPUs/tasks: { 1} (detected by 2, t=147012 jiffies, g=703, c=702, q=579)
Task dump for CPU 1:
kworker/1:1     R running      0   595      2 0x00000002
Workqueue: HIFB_WorkQueque HIFB_WBC_StartWorkQueue{ 1}  (t=147035 jiffies g=703 c=702 q=579)

这些kernel层的报错看着头大,别慌,先把日志提交给AI,让它帮忙分析一下:

✅ 关键问题定位:

🔴 核心异常:RCU stall + kworker 死循环

log 中反复出现:

INFO: rcu_sched self-detected stall on CPU {1}
...
kworker/1:1 R running ... Workqueue: HIFB_WorkQueque HIFB_WBC_StartWorkQueue
✅ 结论:

CPU1 被 kworker 线程永久占用,导致 RCU 调度器无法继续,系统主线程被饿死,无法继续启动 Android 系统服务。


🔍 根因分析:

1. HIFB_WBC_StartWorkQueue 死循环
  • HIFB 是海思平台的 Frame Buffer 驱动模块

  • HIFB_WBC_StartWorkQueue写回缓存(Write Back)初始化流程

  • 该函数内部死循环等待某条件成立,但条件永远不满足(如 DMA 完成、寄存器状态、时钟未就绪等)。

2. DMA 分配失败或死锁
  • log 中显示:

    [<c0649554>] (HI_DRV_SMMU_AllocAndMap)
    [<c0780fe0>] (OPTM_AllocAndMap)
    [<c07874d4>] (HIFB_WBC_InitLayer)

    说明 HIFB 在尝试分配 DMA 缓冲区时阻塞,可能:

    • IOMMU/SMMU 未初始化完成

    • 内存池耗尽

    • 时钟/电源未打开

    • 寄存器配置顺序错误

    • ✅ 根本原因总结

    • HIFB_WBC_InitLayer()OPTM_AllocAndMap()HI_DRV_SMMU_AllocAndMap()hil_mmb_alloc()__dma_clear_buffer()on_each_cpu()

    • on_each_cpu() 会等待所有 CPU 执行某个回调,但 CPU1 被 kworker 本身占用,无法调度,导致死锁

有了关键信息 HIFB_WBC_StartWorkQueue 死循环,那么就先到 HIFB_WBC_StartWorkQueue 源码位置,源码中搜索关键字可以找到:

HiSTBAndroidV600R003C00SPC020\device\hisilicon\bigfish\sdk\source\msp\drv\hifb\adp\src\drv_hifb_wbc.c中有HIFB_WBC_StartWorkQueue()函数定义

#ifndef HI_BUILD_IN_BOOT
HI_VOID HIFB_WBC_StartWorkQueue(struct work_struct *data)
{HIFB_LAYER_ID_E u32LayerID = HIFB_LAYER_ID_BUTT;OPTM_GFX_WORK_S *pstOpenSlvWork = container_of(data, OPTM_GFX_WORK_S, work);u32LayerID = (HIFB_LAYER_ID_E)(pstOpenSlvWork->u32Data);HIFB_WBC_InitLayer(u32LayerID);return;
}
#endif

HIFB_WBC_InitLayer() 的调用从工作队列中移除,改为系统启动后期手动调用。

步骤 1:屏蔽工作队列里的调用

--- a/drivers/graphics/hifb/drv_hifb_wbc.c
+++ b/drivers/graphics/hifb/drv_hifb_wbc.c
@@ -480,7 +480,10 @@ HI_VOID HIFB_WBC_InitLayer(HIFB_LAYER_ID_E enLayerId)#ifndef HI_BUILD_IN_BOOTHI_VOID HIFB_WBC_StartWorkQueue(struct work_struct *data){
-	HIFB_LAYER_ID_E u32LayerID = HIFB_LAYER_ID_BUTT;
+	/* 延迟初始化:不在 kworker 里执行,防止 on_each_cpu 死锁 */
+	return;
+
+	HIFB_LAYER_ID_E u32LayerID = HIFB_LAYER_ID_BUTT;  /* 原逻辑已死码 */OPTM_GFX_WORK_S *pstOpenSlvWork = container_of(data, OPTM_GFX_WORK_S, work);u32LayerID = (HIFB_LAYER_ID_E)(pstOpenSlvWork->u32Data);HIFB_WBC_InitLayer(u32LayerID);

步骤 2:添加 late_initcall 回调
在同一文件末尾(任意位置)加入:

#if !defined(HI_BUILD_IN_BOOT)
static int __init hifb_wbc_late_init(void)
{/* 系统启动后期再初始化 WBC,此时多核调度已稳定 */HIFB_WBC_InitLayer(HIFB_LAYER_SD_0);return 0;
}
late_initcall(hifb_wbc_late_init);
#endif

步骤 3:重新编译内核,烧录kernel.img进行验证

然而事实并没有那么顺利,还是会出现该现象,不过log不一样了,说明上面修改还是有效的,新log如下:

改了之后还是会出现:INFO: rcu_sched self-detected stall on CPU { 0}  (t=147006 jiffies g=-290 c=-291                                                                                                                                    q=112)
Task dump for CPU 0:
swapper/0       R running      0     1      0 0x00000002
[<c00188f8>] (unwind_backtrace) from [<c0013d44>] (show_stack+0x20/0x24)
[<c0013d44>] (show_stack) from [<c004f83c>] (sched_show_task+0xa8/0xf4)
[<c004f83c>] (sched_show_task) from [<c0052438>] (dump_cpu_task+0x3c/0x4c)
[<c0052438>] (dump_cpu_task) from [<c0070128>] (rcu_dump_cpu_stacks+0xa4/0xd8)
[<c0070128>] (rcu_dump_cpu_stacks) from [<c0073628>] (rcu_check_callbacks+0x428/                                                                                                                                   0x764)
[<c0073628>] (rcu_check_callbacks) from [<c00783a8>] (update_process_times+0x50/                                                                                                                                   0x70)
[<c00783a8>] (update_process_times) from [<c00863e4>] (tick_periodic+0x44/0xcc)
[<c00863e4>] (tick_periodic) from [<c008667c>] (tick_handle_periodic+0x38/0x98)
[<c008667c>] (tick_handle_periodic) from [<c05d51d0>] (arch_timer_handler_phys+0                                                                                                                                   x40/0x48)
[<c05d51d0>] (arch_timer_handler_phys) from [<c006c930>] (handle_percpu_devid_ir                                                                                                                                   q+0xb4/0x11c)
[<c006c930>] (handle_percpu_devid_irq) from [<c00688d0>] (__handle_domain_irq+0x                                                                                                                                   8c/0xe4)
[<c00688d0>] (__handle_domain_irq) from [<c00086a8>] (gic_handle_irq+0x30/0x6c)
[<c00086a8>] (gic_handle_irq) from [<c00148c0>] (__irq_svc+0x40/0x54)
Exception stack(0xde45dc70 to 0xde45dcb8)
dc60:                                     00000003 debe2630 00000003 00000001
dc80: c1260a78 debc1bc4 debc1bc0 c12613a0 00000004 c0021d60 00000000 de45dcec
dca0: 00000003 de45dcb8 c008db64 c008db88 20000113 ffffffff
[<c00148c0>] (__irq_svc) from [<c008db88>] (smp_call_function_many+0x214/0x2a0)
[<c008db88>] (smp_call_function_many) from [<c008dc88>] (on_each_cpu+0x38/0x58)
[<c008dc88>] (on_each_cpu) from [<c064c6f0>] (__dma_clear_buffer+0x124/0x150)
[<c064c6f0>] (__dma_clear_buffer) from [<c064d7c4>] (hil_mmb_alloc+0x108/0x2e8)
[<c064d7c4>] (hil_mmb_alloc) from [<c064a458>] (new_mmb+0x24/0x44)
[<c064a458>] (new_mmb) from [<c0649554>] (HI_DRV_SMMU_AllocAndMap+0x3c/0x110)
[<c0649554>] (HI_DRV_SMMU_AllocAndMap) from [<c07dae3c>] (DMX_OsiInitDmxSet+0x80                                                                                                                                   /0x814)
[<c07dae3c>] (DMX_OsiInitDmxSet) from [<c07db650>] (DmxCluster_StartCluster+0x80                                                                                                                                   /0x170)
[<c07db650>] (DmxCluster_StartCluster) from [<c07cff54>] (HI_DRV_DMX_Init+0x44/0                                                                                                                                   x54)
[<c07cff54>] (HI_DRV_DMX_Init) from [<c094add0>] (MCE_ModuleInit+0x80/0x2e4)
[<c094add0>] (MCE_ModuleInit) from [<c094b050>] (MCE_Init+0x1c/0x14c)
[<c094b050>] (MCE_Init) from [<c00089dc>] (do_one_initcall+0xb8/0x1f0)
[<c00089dc>] (do_one_initcall) from [<c11abe28>] (kernel_init_freeable+0x10c/0x1                                                                                                                                   d4)
[<c11abe28>] (kernel_init_freeable) from [<c0c50328>] (kernel_init+0x1c/0xfc)
[<c0c50328>] (kernel_init) from [<c000f960>] (ret_from_fork+0x14/0x20)
Task dump for CPU 3:
swapper/3       R running      0     0      1 0x00000002
[<c0c56a88>] (__schedule) from [<de513fc0>] (0xde513fc0)

🔍 下一步:精确定位新元凶

从最新一份 log 看,死锁路径是:

[<c07dae3c>] (DMX_OsiInitDmxSet+0x80/0x814)
[<c07db650>] (DmxCluster_StartCluster+0x80/0x170)
[<c07cff54>] (HI_DRV_DMX_Init+0x44/0x54)
[<c094add0>] (MCE_ModuleInit+0x80/0x2e4)

问题已转向 DMX(Demux)驱动!


✅ 立即验证:跳过 DMX 初始化

DMX_OsiInitDmxSet() 开头直接返回,源码位置:HiSTBAndroidV600R003C00SPC020\device\hisilicon\bigfish\sdk\source\msp\drv\demux\demux_v2\drv_demux_func.c

static HI_S32 DMX_OsiInitDmxSet(Dmx_Set_S * DmxSet)
{if(1){printk(KERN_WARNING "DMX_OsiInitDmxSet skipped for RCU stall debug\n");return HI_SUCCESS;}else{HI_S32 ret = HI_FAILURE;HI_UNF_DMX_PORT_ATTR_S  PortAttr;HI_UNF_DMX_TSO_PORT_ATTR_S  TSOPortAttr;....}
}

重新编译内核,烧录kernel.img进行验证,这次ok了,没在出现卡在开机logo现象。经测试也没影响其他功能使用,但是我们是直接跳过了DMX初始化,还是不太放心,要看看DMX是干什么用的。查资料后得知:

  1. DMX(Demux)模块只在需要硬件解复用(卫星/有线/地面/IPTV 流)时才必须初始化;
    你的设备若只做网络 OTT 播放本地 USB 文件HDMI 输入完全可以不初始化 DMX,功能无损。

  2. 要把 DMX 从内核 early-init挪到用户空间按需加载,只需:

    • HI_DRV_DMX_Init() 做成 module_init(而非 core_initcall

    • 内核里默认 不编进 DMX 驱动(或编为 <M> 模块)

    • init.rcinsmod /vendor/lib/modules/hi_drv_dmx.ko(何时加载由你决定)


一、DMX 初始化的真实作用

表格

复制

功能场景是否需要 DMX说明
IPTV/OTT 网络播放码流已纯 ES,无需解复用
本地 USB MP4/MKV软件 demux 在播放器完成
HDMI/CVBS 输入无 TS 流
卫星/有线/地面 DVB需硬件解 TS 获取音视频 PID
运营商 IPTV(组播 UDP TS)需硬件解复用

你跳过初始化后“一切正常”即属于左栏场景,故无影响。


二、如何把 DMX 初始化延迟到用户空间

① 把驱动改为模块(推荐)
make ARCH=arm menuconfig
Device Drivers  ---><M> Hisilicon DMX support        # 原为 <*> 改为 <M>

保存后重新编译:

make ARCH=arm CROSS_COMPILE=arm-hisiv500-linux- modules

得到:

drivers/media/dmx/hi_drv_dmx.ko

放进 /vendor/lib/modules/ 并在 init.rc 里:

on post-fs-datainsmod /vendor/lib/modules/hi_drv_dmx.ko

这样开机阶段不再执行 HI_DRV_DMX_Init(),等文件系统 ready 后再加载,可彻底避开 RCU 死锁。

② 若必须编进内核,则改为 late_initcall

修改:

// drivers/media/dmx/dmx_drv.c
// module_init(HI_DRV_DMX_Init);    // 原来
late_initcall(HI_DRV_DMX_Init);     // 放到 arch 之后

级别从 7 延后到 6,避开 early 阶段的 on_each_cpu() 风险。

③ 完全跳过(已验证)

继续保留你现在的 空返回 即可,产品无 DVB 功能时零副作用


三、小结

  • DMX 只负责 TS 流硬件解复用,无 TS 场景可安全跳过。

  • 延迟初始化最简洁办法:把 hi_drv_dmx 编成 <M>,在 init.rcinsmod,既不影响功能,又彻底消除开机 RCU 死锁。

而在我们的产品使用场景刚好用不上DMX,所以我就直接跳过DMX初始化来解决该问题了。

http://www.dtcms.com/a/528701.html

相关文章:

  • 【设计模式笔记03】:里氏代换原则和依赖倒置原则
  • 网站运行环境建设方案南京做网站南京乐识赞
  • [sam2图像分割] 图像编码器 | Hiera FPN Neck
  • 基于 Dify 的 Excel 测试用例自动化脚本生成工作流开发
  • Photoshop - Photoshop 工具栏(16)画笔工具
  • 深圳网站建设信科便宜设计欣赏网
  • CSS简介(本文为个人学习笔记,内容整理自哔哩哔哩UP主【非学者勿扰】的公开课程。 > 所有知识点归属原作者,仅作非商业用途分享)
  • css之box-sizing属性
  • 【设计模式笔记02】:面向对象设计原则-开闭原则
  • 用于电动汽车的永磁同步电机调速系统建模与仿真(论文+)
  • 校区网站建设网站建设区别
  • 网站建设是什么科目注册企业邮箱免费
  • 长短期记忆网络(LSTM)与门控循环单元(GRU)详解
  • 研究报告:系统排列(Systemic Constellations)的原理、理论体系及文献综述
  • 尚庭公寓学习笔记
  • Unity单例模式基类全解析
  • 餐饮行业做网站的数据ctoc网站有哪些
  • 深圳建设局网站投诉电话淄博网站建设优化公司
  • 久治县网站建设公司东莞人才网最新招聘信息
  • MySQL OCP认证、Oracle OCP认证
  • 深入探讨HarmonyOS中ListItem的滑动操作:从基础实现到高级分布式交互
  • Eclipse Uninstall Software
  • 广东南方通信建设有限公司官方网站合肥网站建设的价格
  • C语言<<超全.超重要>>知识点总结
  • 购物网站开发的业务需求分析做钢材什么网站好
  • Spring框架常用注解全面详解与技术实践
  • 机器学习三要素
  • synchronized锁优化与升级机制
  • 设计公司网站运营wordpress+编辑模板
  • URL下载网络资源