由application绑定失败引起的ANR
1、背景
在开发过程中,项目要求进行应用的进程判断,如果应用运行在系统进程,需要退出进程(即只允许应用运行在用户进程上)。
2、抛出ANR
加上应用运行进程的相关判断后,系统上电启动的很短一段时间内抛出了几十次ANR,日志如下:
02-08 11:29:41.550 11925 11925 D nativeloader: Configuring clns-shared-6 for other apk /system/app/MyapplicationService/MyapplicationService.apk. target_sdk_version=34, uses_libraries=, library_path=/system/app/MyapplicationService/lib/arm64:/system/app/MyapplicationService/MyapplicationService.apk!/lib/arm64-v8a:/system/lib64:/system_ext/lib64, permitted_path=/data:/mnt/expand:/data/user/0/org.myapplication.myapplication_app:/system/app/MyapplicationService:/system/lib64:/system_ext/lib64
02-08 11:29:41.552 1731 11956 E ActivityManager: ANR in org.myapplication.myapplication_app
02-08 11:29:41.552 1731 11956 E ActivityManager: PID: 11925
02-08 11:29:41.552 1731 11956 E ActivityManager: Reason: Process ProcessRecord{b5489a0 11925:org.myapplication.myapplication_app/1000} failed to complete startup
02-08 11:29:41.552 1731 11956 E ActivityManager: ErrorId: 144baf44-0bd7-4283-8d23-5472004fc9f3
02-08 11:29:41.552 1731 11956 E ActivityManager: Frozen: false
02-08 11:29:41.552 1731 11956 E ActivityManager: Load: 0.0 / 0.0 / 0.0
02-08 11:29:41.552 1731 11956 E ActivityManager: ----- Output from /proc/pressure/memory -----
02-08 11:29:41.552 1731 11956 E ActivityManager: some avg10=0.00 avg60=0.00 avg300=0.00 total=1989
02-08 11:29:41.552 1731 11956 E ActivityManager: full avg10=0.00 avg60=0.00 avg300=0.00 total=1641
02-08 11:29:41.552 1731 11956 E ActivityManager: ----- End output from /proc/pressure/memory -----
02-08 11:29:41.552 1731 11956 E ActivityManager: ----- Output from /proc/pressure/cpu -----
02-08 11:29:41.552 1731 11956 E ActivityManager: some avg10=25.66 avg60=26.19 avg300=15.18 total=67738154
02-08 11:29:41.552 1731 11956 E ActivityManager: full avg10=0.00 avg60=0.00 avg300=0.00 total=0
02-08 11:29:41.552 1731 11956 E ActivityManager: ----- End output from /proc/pressure/cpu -----
02-08 11:29:41.552 1731 11956 E ActivityManager: ----- Output from /proc/pressure/io -----
02-08 11:29:41.552 1731 11956 E ActivityManager: some avg10=0.00 avg60=0.04 avg300=0.24 total=2099581
02-08 11:29:41.552 1731 11956 E ActivityManager: full avg10=0.00 avg60=0.01 avg300=0.09 total=1065175
02-08 11:29:41.552 1731 11956 E ActivityManager: ----- End output from /proc/pressure/io -----
02-08 11:29:41.552 1731 11956 E ActivityManager:
02-08 11:29:41.552 1731 11956 E ActivityManager: CPU usage from 26707ms to 1320ms ago (2025-02-08 11:29:14.757 to 2025-02-08 11:29:40.143):
02-08 11:29:41.552 1731 11956 E ActivityManager: 31% 927/surfaceflinger: 15% user + 15% kernel / faults: 57 minor
02-08 11:29:41.552 1731 11956 E ActivityManager: 23% 3260/com.baidu.che.codriver: 20% user + 3.6% kernel / faults: 3236 minor
02-08 11:29:41.552 1731 11956 E ActivityManager: 21% 896/media.hwcodec: 5.8% user + 15% kernel / faults: 90 minor
02-08 11:29:41.552 1731 11956 E ActivityManager: 14% 8044/org.myapplication.myapplication_app: 10% user + 4.8% kernel / faults: 1386 minor
02-08 11:29:41.552 1731 11956 E ActivityManager: 14% 18941/adbd: 4% user + 10% kernel / faults: 206213 minor
02-08 11:29:41.552 1731 11956 E ActivityManager: 14% 1731/system_server: 9.5% user + 5.1% kernel / faults: 7931 minor
02-08 11:29:41.552 1731 11956 E ActivityManager: 14% 3881/com.baidu.naviauto: 9.8% user + 4.2% kernel / faults: 8695 minor
02-08 11:29:41.552 1731 11956 E ActivityManager: 13% 21389/app_process: 6.7% user + 6.6% kernel / faults: 319 minor
02-08 11:29:41.552 1731 11956 E ActivityManager: 6.9% 337/logd: 1.1% user + 5.7% kernel / faults: 3174 minor
02-08 11:29:41.552 1731 11956 E ActivityManager: 6.4% 891/vendor.qti.hardware.display.composer-service: 3.2% user + 3.2% kernel / faults: 14 minor
02-08 11:29:41.552 1731 11956 E ActivityManager: 54% TOTAL: 22% user + 25% kernel + 0% iowait + 4.4% irq + 2.5% softirq
02-08 11:29:41.552 1731 11956 E ActivityManager: CPU usage from 254161ms to 254161ms ago (1970-01-01 00:00:00.000 to 1970-01-01 00:00:00.000) with 0% awake:
02-08 11:29:41.552 1731 11956 E ActivityManager: 0% TOTAL: 0% user + 0% kernel
一般来说,系统dump的ANR信息会直接给出抛出ANR的详细原因,但是这个ANR的信息有些少见:
02-08 11:29:41.552 1731 11956 E ActivityManager: Reason: Process ProcessRecord{b5489a0 11925:org.myapplication.myapplication_app/1000} failed to complete startup
它不是Android定义的四大类引起ANR原因的中的一种:input/ContentProvider/Broadcast/Service。于是,我把ANR日志拿来分析,确实也没有提供更多线索:
Subject: Process ProcessRecord{b5489a0 11969:org.myapplication.myapplication_app/1000} failed to complete startup
RssHwmKb: 125140
RssKb: 123096
RssAnonKb: 59260
RssShmemKb: 2548
VmSwapKb: 0
— CriticalEventLog —
capacity: 20
events <
timestamp_ms: 1739014181473
anr <
pid: 11925
process: “org.myapplication.myapplication_app”
process_class: 2
subject: “Process ProcessRecord{b5489a0 11925:org.myapplication.myapplication_app/1000} failed to complete startup”
uid: 1000timestamp_ms: 1739014181851
window_ms: 300000
3、日志分析
从logcat日志及ANR日志没能获取更多的线索,那只能从代码入手看看。在Android源码里搜索关键字failed to complete startup,找到这个字符串由ActivityManagerService.java抛出:
private void handleBindApplicationTimeoutHard(ProcessRecord app) {
final String anrMessage;
synchronized (app) {
anrMessage = "Process " + app + " failed to complete startup";
}
mAnrHelper.appNotResponding(app, TimeoutRecord.forAppStart(anrMessage));
}
追踪代码后,handleBindApplicationTimeoutHard()是由handleBindApplicationTimeoutSoft()里发送了BIND_APPLICATION_TIMEOUT_HARD_MSG消息引起:
private void handleBindApplicationTimeoutSoft(ProcessRecord app, int softTimeoutMillis) {
// Similar logic as the broadcast delivery timeout:
// instead of immediately triggering an ANR, extend the timeout by
// the amount of time the process was runnable-but-waiting; we're
// only willing to do this once before triggering an hard ANR.
final long cpuDelayTime = app.getCpuDelayTime() - app.mProfile.mLastCpuDelayTime.get();
final long hardTimeoutMillis = MathUtils.constrain(cpuDelayTime, 0, softTimeoutMillis);
if (hardTimeoutMillis == 0) {
handleBindApplicationTimeoutHard(app);
return;
}
Slog.i(TAG, "Extending process start timeout by " + hardTimeoutMillis + "ms for " + app);
Trace.instant(Trace.TRACE_TAG_ACTIVITY_MANAGER, "bindApplicationTimeSoft "
+ app.processName + "(" + app.getPid() + ")");
final Message msg = mHandler.obtainMessage(BIND_APPLICATION_TIMEOUT_HARD_MSG, app);
mHandler.sendMessageDelayed(msg, hardTimeoutMillis);
}
而handleBindApplicationTimeoutSoft又是由private void attachApplicationLocked(@NonNull IApplicationThread thread,
int pid, int callingUid, long startSeq)在thread.bindApplication期间抛出。根据Android应用的启动流程,这个ANR应该是在在应用启动的第2步,ActivityManagerService绑定待启动应用的ApplicationThread抛出。
4、解决办法
根据日志及业务分析,应用应该是在系统初始化阶段,由systemservice启动了;systemservice启动的应用都运行在系统进程,在代码里面判断出应用启动在了系统进程被强制回收。但是,应用的后台Service被定义为START_STICKY,即使进程被杀也会重启。于是应用进入了被杀->拉起->被杀->拉起的无限循环中,在某一次被杀拉起的过程中,最终抛出了这个failed to complete startup的ANR。分析到这,这个ANR的解决办法也就很清楚了:1、取消由systemservice拉起应用;或者2、取消应用对运行进程的判断。