WatchDog工作原理
[基于 Android P]
1. SystemServer.startOtherServices
private void startOtherServices() { final Context context = mSystemContext; ... try{ ... traceBeginAndSlog("InitWatchdog"); //【2】实例化 final Watchdog watchdog = Watchdog.getInstance(); //【3】初始化 watchdog.init(context, mActivityManagerService); traceEnd(); ... traceBeginAndSlog("StartWatchdog"); //【4】启动 Watchdog.getInstance().start(); traceEnd(); ... }catch (RuntimeException e) { Slog.e("System", ""); Slog.e("System", " Failure starting core service", e); } ... }
Watchdog继承Thread类,使用单例模式实例化,调用自身init方法初始化。
2. Watchdog.getInstance
public static Watchdog getInstance() { if (sWatchdog == null) { sWatchdog = new Watchdog(); } return sWatchdog; }
实例化watchdog
2.1. Watchdog.Watchdog
private Watchdog() { super("watchdog"); // 为我们要检查的每个公共线程初始化处理程序检查器。 // 请注意,我们当前没有检查后台线程, // 因为它可能会保留更长时间的运行操作, // 而不保证其中的操作的及时性。 // 添加android.fg线程监控 mMonitorChecker = new HandlerChecker(FgThread.getHandler(), "foreground thread", DEFAULT_TIMEOUT); mHandlerCheckers.add(mMonitorChecker); // 添加 main 线程监控器 mHandlerCheckers.add(new HandlerChecker(new Handler(Looper.getMainLooper()), "main thread", DEFAULT_TIMEOUT)); // 添加android.ui线程监控 mHandlerCheckers.add(new HandlerChecker(UiThread.getHandler(), "ui thread", DEFAULT_TIMEOUT)); // 添加android.io线程监控 mHandlerCheckers.add(new HandlerChecker(IoThread.getHandler(), "i/o thread", DEFAULT_TIMEOUT)); // 添加android.display线程监控 mHandlerCheckers.add(new HandlerChecker(DisplayThread.getHandler(), "display thread", DEFAULT_TIMEOUT)); // 初始化binder线程监控 addMonitor(new BinderThreadMonitor()); // 加载fd 监控 open次数保存在/proc/self/fd/中 mOpenFdMonitor = OpenFdMonitor.create(); // See the notes on DEFAULT_TIMEOUT. assert DB || DEFAULT_TIMEOUT > ZygoteConnectionConstants.WRAPPED_PID_TIMEOUT_MILLIS; }
3. Watchdog.init
public void init(Context context, ActivityManagerService activity) { mResolver = context.getContentResolver(); mActivity = activity; context.registerReceiver(new RebootRequestReceiver(), new IntentFilter(Intent.ACTION_REBOOT), android.Manifest.permission.REBOOT, null); }
这里注册一个接收重启广播的Receiver,也就是所谓的软重启。
3.1 RebootRequestReceiver.onReceiver
final class RebootRequestReceiver extends BroadcastReceiver { @Override public void onReceive(Context c, Intent intent) { if (intent.getIntExtra("nowait", 0) != 0) { rebootSystem("Received ACTION_REBOOT broadcast"); return; } Slog.w(TAG, "Unsupported ACTION_REBOOT broadcast: " + intent); } }
RebootRequestReceiver的onReceiver方法调用rebootSystem(PMS的reboot操作)执行手机重启。
4. Watchdog.getInstance().start()
因为Watchdog本身是个Thread,所以它的start方法会调用自身的run方法。
Watchdog.run():
static final boolean DB = false; static final long DEFAULT_TIMEOUT = DB ? 10*1000 : 60*1000; static final long CHECK_INTERVAL = DEFAULT_TIMEOUT / 2;//30s @Override public void run() { boolean waitedHalf = false; while (true) { final List
blockedCheckers; final String subject; final boolean allowRestart; int debuggerWasConnected = 0; synchronized (this) { long timeout = CHECK_INTERVAL;//30s //每30s轮询所有的monitor for (int i=0; i
0) { debuggerWasConnected--; } // 确保30s之后执行下面的代码(防止wait(timeout)发生中断) long start = SystemClock.uptimeMillis(); while (timeout > 0) { if (Debug.isDebuggerConnected()) { debuggerWasConnected = 2; } try { wait(timeout); } catch (InterruptedException e) { Log.wtf(TAG, e); } if (Debug.isDebuggerConnected()) { debuggerWasConnected = 2; } timeout = CHECK_INTERVAL - (SystemClock.uptimeMillis() - start); } boolean fdLimitTriggered = false; if (mOpenFdMonitor != null) { fdLimitTriggered = mOpenFdMonitor.monitor(); } //评估monitor完成状态,并做相应操作 if (!fdLimitTriggered) { //【6】 final int waitState = evaluateCheckerCompletionLocked(); if (waitState == COMPLETED) { //已完成,跳过 waitedHalf = false; continue; } else if (waitState == WAITING) { //waiting状态,但并未超过timeout continue; } else if (waitState == WAITED_HALF) { if (!waitedHalf) { //block 30s时候先dump一次system_server和一些native的 stack ArrayList
pids = new ArrayList
(); pids.add(Process.myPid()); ActivityManagerService.dumpStackTraces(true, pids, null, null, getInterestingNativePids()); waitedHalf = true; //waitedHalf这个变量保证下一次过来还是当前状态不用dump堆栈,交给下面部分去dump. } continue; } // 如果状态是 overdue!,也就是超过60秒 blockedCheckers = getBlockedCheckersLocked();//【7】 subject = describeCheckersLocked(blockedCheckers); } else { blockedCheckers = Collections.emptyList(); subject = "Open FD high water mark reached"; } allowRestart = mAllowRestart; } //代码执行到这里说明此时system_server中的监控线程已经卡住并且超过60s, //此时会dump堆栈并kill system_server 然后restart EventLog.writeEvent(EventLogTags.WATCHDOG, subject); ArrayList
pids = new ArrayList<>(); pids.add(Process.myPid()); if (mPhonePid > 0) pids.add(mPhonePid); //dump即将被kill进程的堆栈【8】 final File stack = ActivityManagerService.dumpStackTraces( !waitedHalf, pids, null, null, getInterestingNativePids()); // 多留一点时间保证dump信息可以保存完整 SystemClock.sleep(2000); // 触发内核来dump所有被block的线程,并输出所有CPU上堆栈到kernel log中【9】 doSysRq('w'); doSysRq('l'); // Try to add the error to the dropbox Thread dropboxThread = new Thread("watchdogWriteToDropbox") { public void run() { mActivity.addErrorToDropBox( "watchdog", null, "system_server", null, null, subject, null, stack, null); } }; dropboxThread.start(); try { dropboxThread.join(2000); // wait up to 2 seconds for it to return. } catch (InterruptedException ignored) {} IActivityController controller; synchronized (this) { controller = mController; } if (controller != null) { Slog.i(TAG, "Reporting stuck state to activity controller"); try { Binder.setDumpDisabled("Service dumps disabled due to hung system process."); // 1 = keep waiting, -1 = kill system int res = controller.systemNotResponding(subject); if (res >= 0) { Slog.i(TAG, "Activity controller requested to coninue to wait"); waitedHalf = false; continue; } } catch (RemoteException e) { } } // Only kill the process if the debugger is not attached. if (Debug.isDebuggerConnected()) { debuggerWasConnected = 2; } if (debuggerWasConnected >= 2) { Slog.w(TAG, "Debugger connected: Watchdog is *not* killing the system process"); } else if (debuggerWasConnected > 0) { Slog.w(TAG, "Debugger was connected: Watchdog is *not* killing the system process"); } else if (!allowRestart) { Slog.w(TAG, "Restart not allowed: Watchdog is *not* killing the system process"); } else { Slog.w(TAG, "* WATCHDOG KILLING SYSTEM PROCESS: " + subject); WatchdogDiagnostics.diagnoseCheckers(blockedCheckers); Slog.w(TAG, "* GOODBYE!"); // kill 掉system_server Process.killProcess(Process.myPid()); System.exit(10); } waitedHalf = false; } }
这个方法是watchdog监控的核心:
根据waitState状态来执行不同的操作:
- 当COMPLETED或WAITING,则直接return;
- 当WAITED_HALF(超过30s)且为首次, 则输出system_server和一些Native进程的traces;
- 当OVERDUE, 则dump更多信息.
下面详细分析这个方法:
- [5] hc.scheduleCheckLocked(); // 执行所有的Checker的monitor
- [6] evaluateCheckerCompletionLocked();//检测handlerchecker完成状态
- [7] getBlockedCheckersLocked() //获取卡住60s的hanlerchecker
- [8] ActivityManagerService.dumpStackTraces //dump callstack
- [9] doSysRq(); //dump kernel log
5. Watchdog.HandlerChecker.scheduleCheckLocked
public final class HandlerChecker implements Runnable { private final Handler mHandler; private final String mName; private final long mWaitMax; private final ArrayList
mMonitors = new ArrayList
(); private boolean mCompleted; private Monitor mCurrentMonitor; private long mStartTime; HandlerChecker(Handler handler, String name, long waitMaxMillis) { mHandler = handler; mName = name; mWaitMax = waitMaxMillis; mCompleted = true; } public void addMonitor(Monitor monitor) { mMonitors.add(monitor); } public void scheduleCheckLocked() { if (mMonitors.size() == 0 && mHandler.getLooper().getQueue().isPolling()) { //当mMonitor个数为0(除了android.fg线程之外都为0)且处于poll状态,则设置mCompleted = true; mCompleted = true; return; } if (!mCompleted) { //当上次check还没有完成, 则直接返回. return; } mCompleted = false; mCurrentMonitor = null; mStartTime = SystemClock.uptimeMillis();//为每个checker设置startTime mHandler.postAtFrontOfQueue(this);//发送消息,插入消息队列最开头 } ...... }
mHandler.postAtFrontOfQueue(this): 该方法输入参数为Runnable对象,根据消息机制, 最终会回调HandlerChecker中的run方法。
5.1. HandlerChecker.run
[-> Watchdog.java]
@Override public void run() { final int size = mMonitors.size(); for (int i = 0 ; i < size ; i++) { synchronized (Watchdog.this) { mCurrentMonitor = mMonitors.get(i); } //回调实现Watchdog.Monitor的Service的monitor方法 mCurrentMonitor.monitor(); } synchronized (Watchdog.this) { mCompleted = true; mCurrentMonitor = null; } }
run方法会循环遍历所有的Monitor接口,具体的服务实现该接口的monitor()方法,执行完成后会设置mCompleted = true. 那么当handler消息池当前的消息, 导致迟迟没有机会执行monitor()方法, 则会触发watchdog.
回调实现Watchdog.Monitor的Service的monitor方法以AMS为例:
public class ActivityManagerService extends IActivityManager.Stub implements Watchdog.Monitor, BatteryStatsImpl.BatteryCallback { ... public ActivityManagerService(Context systemContext) { ... Watchdog.getInstance().addMonitor(this); Watchdog.getInstance().addThread(mHandler); ... } // synchronized避免死锁 public void monitor() { synchronized (this) { } } ... }
6. Watchdog.HandlerChecker.evaluateCheckerCompletionLocked();
private int evaluateCheckerCompletionLocked() { int state = COMPLETED; for (int i=0; i
evaluateCheckerCompletionLocked()获取mHandlerCheckers列表中等待状态值最大的state.
getCompletionStateLocked():
- COMPLETED = 0:等待完成;
- WAITING = 1:等待时间小于DEFAULT_TIMEOUT的一半,即30s;
- WAITED_HALF = 2:等待时间处于30s~60s之间;
- OVERDUE = 3:等待时间大于或等于60s。
7. Watchdog.getBlockedCheckersLocked()
private ArrayList
getBlockedCheckersLocked() { ArrayList
checkers = new ArrayList
(); for (int i=0; i
8. ActivityManagerService.dumpStackTraces
整个watchdog详细版流程图如下:
发布者:全栈程序员-站长,转载请注明出处:https://javaforall.net/212790.html原文链接:https://javaforall.net
