一. 概述
当input事件处理得慢就会触发ANR,那ANR内部原理是什么,哪些场景会产生ANR呢。 “工欲善其事必先利其器”,为了理解input ANR原理,前面几篇文章疏通了整个input框架的处理流程,都是为了这篇文章而做铺垫。在正式开始分析ANR触发原理以及触发场景之前,先来回顾一下input流程。
1.1 InputReader
点击查看:
InputReader的主要工作分两部分:
- 调用EventHub的getEvents()读取节点/dev/input的input_event结构体转换成RawEvent结构体,RawEvent根据不同InputMapper来转换成相应的EventEntry,比如按键事件则对应KeyEntry,触摸事件则对应MotionEntry。
- 转换结果:inut_event -> EventEntry;
- 将事件添加到mInboundQueue队列尾部,加入该队列前有以下两个过滤:
- IMS.interceptKeyBeforeQueueing:事件分发前可增加业务逻辑;
- IMS.filterInputEvent:可拦截事件,当返回值为false的事件都直接拦截,没有机会加入mInboundQueue队列,不会再往下分发;否则进入下一步;
- enqueueInboundEventLocked:该事件放入mInboundQueue队列尾部;
- mLooper->wake:并根据情况来唤醒InputDispatcher线程.
- KeyboardInputMapper.processKey()的过程, 记录下按下down事件的时间点.
1.2 InputDispatcher
点击查看:
- dispatchOnceInnerLocked(): 从InputDispatcher的
mInboundQueue
队列,取出事件EventEntry。另外该方法开始执行的时间点(currentTime)便是后续事件dispatchEntry的分发时间(deliveryTime) - dispatchKeyLocked():满足一定条件时会添加命令doInterceptKeyBeforeDispatchingLockedInterruptible;
- enqueueDispatchEntryLocked():生成事件DispatchEntry并加入connection的
outbound
队列 - startDispatchCycleLocked():从outboundQueue中取出事件DispatchEntry, 重新放入connection的
waitQueue
队列; - runCommandsLockedInterruptible():通过循环遍历地方式,依次处理mCommandQueue队列中的所有命令。而mCommandQueue队列中的命令是通过postCommandLocked()方式向该队列添加的。ANR回调命令便是在这个时机执行。
- handleTargetsNotReadyLocked(): 该过程会判断是否等待超过5s来决定是否调用onANRLocked().
流程15中sendMessage是将input事件分发到app端,当app处理完该事件后会发送finishInputEvent()事件. 接下来又回到pollOnce()方法.
1.3 UI Thread
- “InputDispatcher”线程监听socket服务端,收到消息后回调InputDispatcher.handleReceiveCallback();
- UI主线程监听socket客户端,收到消息后回调NativeInputEventReceiver.handleEvent().
对于ANR的触发主要是在InputDispatcher过程,下面再从ANR的角度来说一说ANR触发过程。
二. ANR处理流程
ANR时间区别便是指当前这次的事件dispatch过程中执行findFocusedWindowTargetsLocked()方法到下一次执行resetANRTimeoutsLocked()的时间区间. 以下5个时机会reset. 都位于InputDispatcher.cpp文件:
- resetAndDropEverythingLocked
- releasePendingEventLocked
- setFocusedApplication
- dispatchOnceInnerLocked
- setInputDispatchMode
简单来说, 主要是以下4个场景,会有机会执行resetANRTimeoutsLocked:
- 解冻屏幕, 系统开/关机的时刻点 (thawInputDispatchingLw, setEventDispatchingLw)
- wms聚焦app的改变 (WMS.setFocusedApp, WMS.removeAppToken)
- 设置input filter的过程 (IMS.setInputFilter)
- 再次分发事件的过程(dispatchOnceInnerLocked)
当 findFocusedWindowTargetsLocked()过程调用到handleTargetsNotReadyLocked,且满足超时5s的情况则会调用onANRLocked().
2.1 onANRLocked
[-> InputDispatcher.cpp]
void InputDispatcher::onANRLocked( nsecs_t currentTime, const sp & applicationHandle, const sp & windowHandle, nsecs_t eventTime, nsecs_t waitStartTime, const char* reason) { float dispatchLatency = (currentTime - eventTime) * 0.000001f; float waitDuration = (currentTime - waitStartTime) * 0.000001f; ALOGI("Application is not responding: %s. " "It has been %0.1fms since event, %0.1fms since wait started. Reason: %s", getApplicationWindowLabelLocked(applicationHandle, windowHandle).string(), dispatchLatency, waitDuration, reason); //捕获ANR的现场信息 time_t t = time(NULL); struct tm tm; localtime_r(&t, &tm); char timestr[64]; strftime(timestr, sizeof(timestr), "%F %T", &tm); mLastANRState.clear(); mLastANRState.append(INDENT "ANR:\n"); mLastANRState.appendFormat(INDENT2 "Time: %s\n", timestr); mLastANRState.appendFormat(INDENT2 "Window: %s\n", getApplicationWindowLabelLocked(applicationHandle, windowHandle).string()); mLastANRState.appendFormat(INDENT2 "DispatchLatency: %0.1fms\n", dispatchLatency); mLastANRState.appendFormat(INDENT2 "WaitDuration: %0.1fms\n", waitDuration); mLastANRState.appendFormat(INDENT2 "Reason: %s\n", reason); dumpDispatchStateLocked(mLastANRState); //将ANR命令加入mCommandQueue CommandEntry* commandEntry = postCommandLocked( & InputDispatcher::doNotifyANRLockedInterruptible); commandEntry->inputApplicationHandle = applicationHandle; commandEntry->inputWindowHandle = windowHandle; commandEntry->reason = reason; }
发生ANR调用onANRLocked()的过程会将doNotifyANRLockedInterruptible加入mCommandQueue。 在下一轮InputDispatcher.dispatchOnce的过程中会先执行runCommandsLockedInterruptible()方法,取出 mCommandQueue队列的所有命令逐一执行。那么ANR所对应的命令doNotifyANRLockedInterruptible,接下来看该方法。
3.2 doNotifyANRLockedInterruptible
[-> InputDispatcher.cpp]
void InputDispatcher::doNotifyANRLockedInterruptible( CommandEntry* commandEntry) { mLock.unlock(); //[见小节3.3] nsecs_t newTimeout = mPolicy->notifyANR( commandEntry->inputApplicationHandle, commandEntry->inputWindowHandle, commandEntry->reason); mLock.lock(); //newTimeout =5s [见小节3.8] resumeAfterTargetsNotReadyTimeoutLocked(newTimeout, commandEntry->inputWindowHandle != NULL ? commandEntry->inputWindowHandle->getInputChannel() : NULL);}
mPolicy是指NativeInputManager
3.3 NativeInputManager.notifyANR
[-> com_android_server_input_InputManagerService.cpp]
nsecs_t NativeInputManager::notifyANR(const sp & inputApplicationHandle, const sp & inputWindowHandle, const String8& reason) { JNIEnv* env = jniEnv(); jobject inputApplicationHandleObj = getInputApplicationHandleObjLocalRef(env, inputApplicationHandle); jobject inputWindowHandleObj = getInputWindowHandleObjLocalRef(env, inputWindowHandle); jstring reasonObj = env->NewStringUTF(reason.string()); //调用Java方法[见小节3.4] jlong newTimeout = env->CallLongMethod(mServiceObj, gServiceClassInfo.notifyANR, inputApplicationHandleObj, inputWindowHandleObj, reasonObj); if (checkAndClearExceptionFromCallback(env, "notifyANR")) { newTimeout = 0; //抛出异常,则清理并重置timeout } ... return newTimeout; }
先看看register_android_server_InputManager过程:
int register_android_server_InputManager(JNIEnv* env) { int res = jniRegisterNativeMethods(env, "com/android/server/input/InputManagerService", gInputManagerMethods, NELEM(gInputManagerMethods)); jclass clazz; FIND_CLASS(clazz, "com/android/server/input/InputManagerService"); ... GET_METHOD_ID(gServiceClassInfo.notifyANR, clazz, "notifyANR", "(Lcom/android/server/input/InputApplicationHandle;Lcom/android/server/input/InputWindowHandle;Ljava/lang/String;)J"); ... }
可知gServiceClassInfo.notifyANR是指IMS.notifyANR
3.4 IMS.notifyANR
[-> InputManagerService.java]
private long notifyANR(InputApplicationHandle inputApplicationHandle, InputWindowHandle inputWindowHandle, String reason) { //[见小节3.5] return mWindowManagerCallbacks.notifyANR( inputApplicationHandle, inputWindowHandle, reason); }
此处mWindowManagerCallbacks是指InputMonitor对象。
3.5 InputMonitor.notifyANR
[-> InputMonitor.java]
public long notifyANR(InputApplicationHandle inputApplicationHandle, InputWindowHandle inputWindowHandle, String reason) { AppWindowToken appWindowToken = null; WindowState windowState = null; boolean aboveSystem = false; synchronized (mService.mWindowMap) { if (inputWindowHandle != null) { windowState = (WindowState) inputWindowHandle.windowState; if (windowState != null) { appWindowToken = windowState.mAppToken; } } if (appWindowToken == null && inputApplicationHandle != null) { appWindowToken = (AppWindowToken)inputApplicationHandle.appWindowToken; } //输出input事件分发超时log if (windowState != null) { Slog.i(WindowManagerService.TAG, "Input event dispatching timed out " + "sending to " + windowState.mAttrs.getTitle() + ". Reason: " + reason); int systemAlertLayer = mService.mPolicy.windowTypeToLayerLw( WindowManager.LayoutParams.TYPE_SYSTEM_ALERT); aboveSystem = windowState.mBaseLayer > systemAlertLayer; } else if (appWindowToken != null) { Slog.i(WindowManagerService.TAG, "Input event dispatching timed out " + "sending to application " + appWindowToken.stringName + ". Reason: " + reason); } else { Slog.i(WindowManagerService.TAG, "Input event dispatching timed out " + ". Reason: " + reason); } mService.saveANRStateLocked(appWindowToken, windowState, reason); } if (appWindowToken != null && appWindowToken.appToken != null) { //【见小节3.6.1】 boolean abort = appWindowToken.appToken.keyDispatchingTimedOut(reason); if (! abort) { return appWindowToken.inputDispatchingTimeoutNanos; //5s } } else if (windowState != null) { //【见小节3.6.2】 long timeout = ActivityManagerNative.getDefault().inputDispatchingTimedOut( windowState.mSession.mPid, aboveSystem, reason); if (timeout >= 0) { return timeout * 1000000L; //5s } } return 0; }
发生input相关的ANR时在system log输出ANR信息,并且tag为WindowManager. 主要有3类log:
- Input event dispatching timed out sending to [windowState.mAttrs.getTitle()]
- Input event dispatching timed out sending to application [appWindowToken.stringName)]
- Input event dispatching timed out sending.
3.6 DispatchingTimedOut
3.6.1 Token.keyDispatchingTimedOut
[-> ActivityRecord.java :: Token]
final class ActivityRecord { static class Token extends IApplicationToken.Stub { public boolean keyDispatchingTimedOut(String reason) { ActivityRecord r; ActivityRecord anrActivity; ProcessRecord anrApp; synchronized (mService) { r = tokenToActivityRecordLocked(this); if (r == null) { return false; } anrActivity = r.getWaitingHistoryRecordLocked(); anrApp = r != null ? r.app : null; } //[见小节3.7] return mService.inputDispatchingTimedOut(anrApp, anrActivity, r, false, reason); } ... } }
3.6.2 AMS.inputDispatchingTimedOut
public long inputDispatchingTimedOut(int pid, final boolean aboveSystem, String reason) { ... ProcessRecord proc; long timeout; synchronized (this) { synchronized (mPidsSelfLocked) { proc = mPidsSelfLocked.get(pid); //根据pid查看进程record } timeout = getInputDispatchingTimeoutLocked(proc); } //【见小节3.7】 if (!inputDispatchingTimedOut(proc, null, null, aboveSystem, reason)) { return -1; } return timeout; }
inputDispatching的超时为KEY_DISPATCHING_TIMEOUT
,即timeout = 5s。
3.7 AMS.inputDispatchingTimedOut
public boolean inputDispatchingTimedOut(final ProcessRecord proc, final ActivityRecord activity, final ActivityRecord parent, final boolean aboveSystem, String reason) { ... final String annotation; if (reason == null) { annotation = "Input dispatching timed out"; } else { annotation = "Input dispatching timed out (" + reason + ")"; } if (proc != null) { ... //通过handler机制,交由“ActivityManager”线程执行ANR处理过程。 mHandler.post(new Runnable() { public void run() { appNotResponding(proc, activity, parent, aboveSystem, annotation); } }); } return true; }
会输出现场的重要进程的trace等信息。 再回到【小节3.2】处理完ANR后再调用resumeAfterTargetsNotReadyTimeoutLocked。
3.8 resumeAfterTargetsNotReadyTimeoutLocked
[-> InputDispatcher.cpp]
void InputDispatcher::resumeAfterTargetsNotReadyTimeoutLocked(nsecs_t newTimeout, const sp & inputChannel) { if (newTimeout > 0) { //超时时间增加5s mInputTargetWaitTimeoutTime = now() + newTimeout; } else { // Give up. mInputTargetWaitTimeoutExpired = true; // Input state will not be realistic. Mark it out of sync. if (inputChannel.get()) { ssize_t connectionIndex = getConnectionIndexLocked(inputChannel); if (connectionIndex >= 0) { sp connection = mConnectionsByFd.valueAt(connectionIndex); sp windowHandle = connection->inputWindowHandle; if (windowHandle != NULL) { const InputWindowInfo* info = windowHandle->getInfo(); if (info) { ssize_t stateIndex = mTouchStatesByDisplay.indexOfKey(info->displayId); if (stateIndex >= 0) { mTouchStatesByDisplay.editValueAt(stateIndex).removeWindow( windowHandle); } } } if (connection->status == Connection::STATUS_NORMAL) { CancelationOptions options(CancelationOptions::CANCEL_ALL_EVENTS, "application not responding"); synthesizeCancelationEventsForConnectionLocked(connection, options); } } } } }
四. input死锁监测机制
4.1 IMS.start
[-> InputManagerService.java]
public void start() { ... Watchdog.getInstance().addMonitor(this); ... }
InputManagerService实现了Watchdog.Monitor接口, 并且在启动过程将自己加入到了Watchdog线程的monitor队列.
4.2 IMS.monitor
Watchdog便会定时调用IMS.monitor()方法.
public void monitor() { synchronized (mInputFilterLock) { } nativeMonitor(mPtr); }
nativeMonitor经过JNI调用,进如如下方法:
static void nativeMonitor(JNIEnv*, jclass, jlong ptr) { NativeInputManager* im = reinterpret_cast (ptr); im->getInputManager()->getReader()->monitor(); //见小节4.3 im->getInputManager()->getDispatcher()->monitor(); //见小节4.4 }
4.3 InputReader.monitor
[-> InputReader.cpp]
void InputReader::monitor() { //请求和释放一次mLock,来确保reader没有发生死锁的问题 mLock.lock(); mEventHub->wake(); mReaderIsAliveCondition.wait(mLock); mLock.unlock(); //监测EventHub[见小节4.3.1] mEventHub->monitor();}
获取mLock之后进入Condition类型的wait()方法,等待InputReader线程的loopOnce()中的broadcast()来唤醒.
void InputReader::loopOnce() { size_t count = mEventHub->getEvents(timeoutMillis, mEventBuffer, EVENT_BUFFER_SIZE); ... { AutoMutex _l(mLock); mReaderIsAliveCondition.broadcast(); if (count) { processEventsLocked(mEventBuffer, count); } } ... mQueuedListener->flush();}
4.3.1 EventHub.monitor
[-> EventHub.cpp]
void EventHub::monitor() { //请求和释放一次mLock,来确保reader没有发生死锁的问题 mLock.lock(); mLock.unlock();}
4.4 InputDispatcher
[-> InputDispatcher.cpp]
void InputDispatcher::monitor() { mLock.lock(); mLooper->wake(); mDispatcherIsAliveCondition.wait(mLock); mLock.unlock();}
获取mLock之后进入Condition类型的wait()方法,等待IInputDispatcher线程的loopOnce()中的broadcast()来唤醒.
void InputDispatcher::dispatchOnce() { nsecs_t nextWakeupTime = LONG_LONG_MAX; { AutoMutex _l(mLock); mDispatcherIsAliveCondition.broadcast(); if (!haveCommandsLocked()) { dispatchOnceInnerLocked(&nextWakeupTime); } if (runCommandsLockedInterruptible()) { nextWakeupTime = LONG_LONG_MIN; } } nsecs_t currentTime = now(); int timeoutMillis = toMillisecondTimeoutDelay(currentTime, nextWakeupTime); mLooper->pollOnce(timeoutMillis); //进入epoll_wait }
4.5 小节
通过将InputManagerService加入到Watchdog的monitor队列,定时监测是否发生死锁. 整个监测过涉及EventHub, InputReader, InputDispatcher, InputManagerService的死锁监测. 监测的原理很简单,通过尝试获取锁并释放锁的方式.
最后, 可通过adb shell dumpsys input
来查看手机当前的input状态, 输出内容分别为EventHub.dump(), InputReader.dump(),InputDispatcher.dump()这3类,另外如果发生过input ANR,那么也会输出上一个ANR的状态.
其中mPendingEvent代表的当下正在处理的事件.
五. 总结
5.1 ANR分类
由小节[3.5] InputMonitor.notifyANR完成, 当发生ANR时system log中会出现以下信息, 并且TAG=WindowManager:
Input event dispatching timed out xxx. Reason: + reason
, 其中xxx取值:
- 窗口类型: sending to
windowState.mAttrs.getTitle()
- 应用类型: sending to application
appWindowToken.stringName
- 其他类型: 则为空.
至于Reason主要有以下类型:
5.1.1 reason类型
由小节[2.3.1]checkWindowReadyForMoreInputLocked完成, ANR reason主要有以下几类:
- 无窗口, 有应用:Waiting because no window has focus but there is a focused application that may eventually add a window when it finishes starting up.
- 窗口暂停: Waiting because the
[targetType]
window is paused. - 窗口未连接: Waiting because the
[targetType]
window’s input channel is not registered with the input dispatcher. The window may be in the process of being removed. - 窗口连接已死亡:Waiting because the
[targetType]
window’s input connection is[Connection.Status]
. The window may be in the process of being removed. - 窗口连接已满:Waiting because the
[targetType]
window’s input channel is full. Outbound queue length:[outboundQueue长度]
. Wait queue length:[waitQueue长度]
. - 按键事件,输出队列或事件等待队列不为空:Waiting to send key event because the
[targetType]
window has not finished processing all of the input events that were previously delivered to it. Outbound queue length:[outboundQueue长度]
. Wait queue length:[waitQueue长度]
. - 非按键事件,事件等待队列不为空且头事件分发超时500ms:Waiting to send non-key event because the
[targetType]
window has not finished processing certain input events that were delivered to it over 500ms ago. Wait queue length:[waitQueue长度]
. Wait queue head age:[等待时长]
.
其中
- targetType: 取值为”focused”或者”touched”
- Connection.Status: 取值为”NORMAL”,”BROKEN”,”ZOMBIE”
另外, findFocusedWindowTargetsLocked, findTouchedWindowTargetsLocked这两个方法中可以通过实现 updateDispatchStatisticsLocked()来分析anr问题.
5.2 drop事件分类
由小节[2.1.2] dropInboundEventLocked完成,输出事件丢弃的原因:
- DROP_REASON_POLICY: “inbound event was dropped because the policy consumed it”;
- DROP_REASON_DISABLED: “inbound event was dropped because input dispatch is disabled”;
- DROP_REASON_APP_SWITCH: “inbound event was dropped because of pending overdue app switch”;
- DROP_REASON_BLOCKED: “inbound event was dropped because the current application is not responding and the user has started interacting with a different application””;
- DROP_REASON_STALE: “inbound event was dropped because it is stale”;
其他:
- doDispatchCycleFinishedLockedInterruptible的过程, 会记录分发时间超过2s的事件,
- findFocusedWindowTargetsLocked的过程, 可以统计等待时长信息.
转自: