Android 救援模式(Rescue Mode)原理剖析

Android 救援模式(Rescue Mode)原理剖析

引言

现在的一个Android设备出货,比如手机,平板和车机,都肯定是经过了很多次的测试。

软件的品质起码是有一个基本的保障。

但是有个实际情况是,当手机在市场上面发售以后,测试是没有办法模拟出来用户的所有操作的。

市场上的消费者包括小白用户,当手机出现各种异常时,用户只能通过设备商售后处理。

而现在售后一般对ROOT,和自己烧一些不是官方发布的软件版本是不保修的。

Android考虑到了这一点,所以增加了救援模式的功能。

可以在严重时,提供给用户恢复出厂设置的选项。

这也就是本文分析的内容。

救援级别

针对不同问题的严重级别,系统定制了不同的救援等级,说明如下:

@VisibleForTesting

static final int LEVEL_NONE = 0;

@VisibleForTesting

static final int LEVEL_RESET_SETTINGS_UNTRUSTED_DEFAULTS = 1;

@VisibleForTesting

static final int LEVEL_RESET_SETTINGS_UNTRUSTED_CHANGES = 2;

@VisibleForTesting

static final int LEVEL_RESET_SETTINGS_TRUSTED_DEFAULTS = 3;

@VisibleForTesting

static final int LEVEL_FACTORY_RESET = 4;

我们可以看到,从0 -> 4其实就是随着严重的等级不断的提升,到了4,其实就是factory的操作。

APP级别救援实现

流程图如下:

我们来看下具体的实现过程:

PWD:frameworks/base/core/java/com/android/internal/os/RuntimeInit.java

/**

* Handle application death from an uncaught exception. The framework

* catches these for the main threads, so this should only matter for

* threads created by applications. Before this method runs, the given

* instance of {@link LoggingHandler} should already have logged details

* (and if not it is run first).

*/

private static class KillApplicationHandler implements Thread.UncaughtExceptionHandler {

private final LoggingHandler mLoggingHandler;

@Override

public void uncaughtException(Thread t, Throwable e) {

try {

ensureLogging(t, e);

// Don't re-enter -- avoid infinite loops if crash-reporting crashes.

if (mCrashing) return;

mCrashing = true;

// Try to end profiling. If a profiler is running at this point, and we kill the

// process (below), the in-memory buffer will be lost. So try to stop, which will

// flush the buffer. (This makes method trace profiling useful to debug crashes.)

if (ActivityThread.currentActivityThread() != null) {

ActivityThread.currentActivityThread().stopProfiling();

}

// Bring up crash dialog, wait for it to be dismissed

ActivityManager.getService().handleApplicationCrash(

mApplicationObject, new ApplicationErrorReport.ParcelableCrashInfo(e));

} catch (Throwable t2) {

if (t2 instanceof DeadObjectException) {

// System process is dead; ignore

} else {

try {

Clog_e(TAG, "Error reporting crash", t2);

} catch (Throwable t3) {

// Even Clog_e() fails! Oh well.

}

}

} finally {

// Try everything to make sure this process goes away.

Process.killProcess(Process.myPid());

System.exit(10);

}

}

KillApplicationHandler是一个内部类,我们这边只截取了一个方法KillApplicationHandler。

当APP出现异常,被Kill掉后,会进入到该方法中去进行处理。

这里会调用ActivityManager.getService().handleApplicationCrash来进行后续的处理。

PWD:frameworks/base/services/core/java/com/android/server/am/ActivityManagerService.java

/**

* Used by {@link com.android.internal.os.RuntimeInit} to report when an application crashes.

* The application process will exit immediately after this call returns.

* @param app object of the crashing app, null for the system server

* @param crashInfo describing the exception

*/

public void handleApplicationCrash(IBinder app,

ApplicationErrorReport.ParcelableCrashInfo crashInfo) {

ProcessRecord r = findAppProcess(app, "Crash");

final String processName = app == null ? "system_server"

: (r == null ? "unknown" : r.processName);

handleApplicationCrashInner("crash", r, processName, crashInfo);

}

这个注释也很有意思:

Used by {@link com.android.internal.os.RuntimeInit} to report when an application crashes.

然后就去将Crash的ProcessName,和CrashInfo去通过handleApplicationCrashInner进行处理。

/* Native crash reporting uses this inner version because it needs to be somewhat

* decoupled from the AM-managed cleanup lifecycle

*/

void handleApplicationCrashInner(String eventType, ProcessRecord r, String processName,

ApplicationErrorReport.CrashInfo crashInfo) {

EventLogTags.writeAmCrash(Binder.getCallingPid(),

UserHandle.getUserId(Binder.getCallingUid()), processName,

r == null ? -1 : r.info.flags,

crashInfo.exceptionClassName,

crashInfo.exceptionMessage,

crashInfo.throwFileName,

crashInfo.throwLineNumber);

FrameworkStatsLog.write(FrameworkStatsLog.APP_CRASH_OCCURRED,

Binder.getCallingUid(),

eventType,

processName,

Binder.getCallingPid(),

(r != null && r.info != null) ? r.info.packageName : "",

(r != null && r.info != null) ? (r.info.isInstantApp()

? FrameworkStatsLog.APP_CRASH_OCCURRED__IS_INSTANT_APP__TRUE

: FrameworkStatsLog.APP_CRASH_OCCURRED__IS_INSTANT_APP__FALSE)

: FrameworkStatsLog.APP_CRASH_OCCURRED__IS_INSTANT_APP__UNAVAILABLE,

r != null ? (r.isInterestingToUserLocked()

? FrameworkStatsLog.APP_CRASH_OCCURRED__FOREGROUND_STATE__FOREGROUND

: FrameworkStatsLog.APP_CRASH_OCCURRED__FOREGROUND_STATE__BACKGROUND)

: FrameworkStatsLog.APP_CRASH_OCCURRED__FOREGROUND_STATE__UNKNOWN,

processName.equals("system_server") ? ServerProtoEnums.SYSTEM_SERVER

: (r != null) ? r.getProcessClassEnum()

: ServerProtoEnums.ERROR_SOURCE_UNKNOWN

);

final int relaunchReason = r == null ? RELAUNCH_REASON_NONE

: r.getWindowProcessController().computeRelaunchReason();

final String relaunchReasonString = relaunchReasonToString(relaunchReason);

if (crashInfo.crashTag == null) {

crashInfo.crashTag = relaunchReasonString;

} else {

crashInfo.crashTag = crashInfo.crashTag + " " + relaunchReasonString;

}

addErrorToDropBox(

eventType, r, processName, null, null, null, null, null, null, crashInfo);

mAppErrors.crashApplication(r, crashInfo);

}

addErrorToDropBox函数如果熟悉android Log系统的同学,都会知道这个是一个非常重要的Error处理函数。

这个我们会在后续Log的分析文章中,进行专门的说明。

这里我们关心的是mAppErrors.crashApplication(r, crashInfo);

/**

* Bring up the "unexpected error" dialog box for a crashing app.

* Deal with edge cases (intercepts from instrumented applications,

* ActivityController, error intent receivers, that sort of thing).

* @param r the application crashing

* @param crashInfo describing the failure

*/

void crashApplication(ProcessRecord r, ApplicationErrorReport.CrashInfo crashInfo) {

final int callingPid = Binder.getCallingPid();

final int callingUid = Binder.getCallingUid();

final long origId = Binder.clearCallingIdentity();

try {

crashApplicationInner(r, crashInfo, callingPid, callingUid);

} finally {

Binder.restoreCallingIdentity(origId);

}

}

看下CrashApplicationInner的实现:

void crashApplicationInner(ProcessRecord r, ApplicationErrorReport.CrashInfo crashInfo,

int callingPid, int callingUid) {

long timeMillis = System.currentTimeMillis();

String shortMsg = crashInfo.exceptionClassName;

String longMsg = crashInfo.exceptionMessage;

String stackTrace = crashInfo.stackTrace;

if (shortMsg != null && longMsg != null) {

longMsg = shortMsg + ": " + longMsg;

} else if (shortMsg != null) {

longMsg = shortMsg;

}

if (r != null) {

mPackageWatchdog.onPackageFailure(r.getPackageListWithVersionCode(),

PackageWatchdog.FAILURE_REASON_APP_CRASH);

mService.mProcessList.noteAppKill(r, (crashInfo != null

&& "Native crash".equals(crashInfo.exceptionClassName))

? ApplicationExitInfo.REASON_CRASH_NATIVE

: ApplicationExitInfo.REASON_CRASH,

ApplicationExitInfo.SUBREASON_UNKNOWN,

"crash");

}

final int relaunchReason = r != null

? r.getWindowProcessController().computeRelaunchReason() : RELAUNCH_REASON_NONE;

AppErrorResult result = new AppErrorResult();

int taskId;

synchronized (mService) {

/**

* If crash is handled by instance of {@link android.app.IActivityController},

* finish now and don't show the app error dialog.

*/

if (handleAppCrashInActivityController(r, crashInfo, shortMsg, longMsg, stackTrace,

timeMillis, callingPid, callingUid)) {

return;

}

// Suppress crash dialog if the process is being relaunched due to a crash during a free

// resize.

if (relaunchReason == RELAUNCH_REASON_FREE_RESIZE) {

return;

}

/**

* If this process was running instrumentation, finish now - it will be handled in

* {@link ActivityManagerService#handleAppDiedLocked}.

*/

if (r != null && r.getActiveInstrumentation() != null) {

return;

}

// Log crash in battery stats.

if (r != null) {

mService.mBatteryStatsService.noteProcessCrash(r.processName, r.uid);

}

AppErrorDialog.Data data = new AppErrorDialog.Data();

data.result = result;

data.proc = r;

// If we can't identify the process or it's already exceeded its crash quota,

// quit right away without showing a crash dialog.

if (r == null || !makeAppCrashingLocked(r, shortMsg, longMsg, stackTrace, data)) {

return;

}

final Message msg = Message.obtain();

msg.what = ActivityManagerService.SHOW_ERROR_UI_MSG;

taskId = data.taskId;

msg.obj = data;

mService.mUiHandler.sendMessage(msg);

}

int res = result.get();

Intent appErrorIntent = null;

MetricsLogger.action(mContext, MetricsProto.MetricsEvent.ACTION_APP_CRASH, res);

if (res == AppErrorDialog.TIMEOUT || res == AppErrorDialog.CANCEL) {

res = AppErrorDialog.FORCE_QUIT;

}

synchronized (mService) {

if (res == AppErrorDialog.MUTE) {

stopReportingCrashesLocked(r);

}

if (res == AppErrorDialog.RESTART) {

mService.mProcessList.removeProcessLocked(r, false, true,

ApplicationExitInfo.REASON_CRASH, "crash");

if (taskId != INVALID_TASK_ID) {

try {

mService.startActivityFromRecents(taskId,

ActivityOptions.makeBasic().toBundle());

} catch (IllegalArgumentException e) {

// Hmm...that didn't work. Task should either be in recents or associated

// with a stack.

Slog.e(TAG, "Could not restart taskId=" + taskId, e);

}

}

}

if (res == AppErrorDialog.FORCE_QUIT) {

long orig = Binder.clearCallingIdentity();

try {

// Kill it with fire!

mService.mAtmInternal.onHandleAppCrash(r.getWindowProcessController());

if (!r.isPersistent()) {

mService.mProcessList.removeProcessLocked(r, false, false,

ApplicationExitInfo.REASON_CRASH, "crash");

mService.mAtmInternal.resumeTopActivities(false /* scheduleIdle */);

}

} finally {

Binder.restoreCallingIdentity(orig);

}

}

if (res == AppErrorDialog.APP_INFO) {

appErrorIntent = new Intent(Settings.ACTION_APPLICATION_DETAILS_SETTINGS);

appErrorIntent.setData(Uri.parse("package:" + r.info.packageName));

appErrorIntent.addFlags(Intent.FLAG_ACTIVITY_NEW_TASK);

}

if (res == AppErrorDialog.FORCE_QUIT_AND_REPORT) {

appErrorIntent = createAppErrorIntentLocked(r, timeMillis, crashInfo);

}

if (r != null && !r.isolated && res != AppErrorDialog.RESTART) {

// XXX Can't keep track of crash time for isolated processes,

// since they don't have a persistent identity.

mProcessCrashTimes.put(r.info.processName, r.uid,

SystemClock.uptimeMillis());

}

}

if (appErrorIntent != null) {

try {

mContext.startActivityAsUser(appErrorIntent, new UserHandle(r.userId));

} catch (ActivityNotFoundException e) {

Slog.w(TAG, "bug report receiver dissappeared", e);

}

}

}

在出现Crash的情况下,将会调用mPackageWatchdog的onPackageFailure函数。

mPackageWatchdog.onPackageFailure(r.getPackageListWithVersionCode(),

PackageWatchdog.FAILURE_REASON_APP_CRASH);

onPackageFailure的实现如下:

/**

* Called when a process fails due to a crash, ANR or explicit health check.

*

*

For each package contained in the process, one registered observer with the least user

* impact will be notified for mitigation.

*

*

This method could be called frequently if there is a severe problem on the device.

*/

public void onPackageFailure(List packages,

@FailureReasons int failureReason) {

if (packages == null) {

Slog.w(TAG, "Could not resolve a list of failing packages");

return;

}

mLongTaskHandler.post(() -> {

synchronized (mLock) {

if (mAllObservers.isEmpty()) {

return;

}

boolean requiresImmediateAction = (failureReason == FAILURE_REASON_NATIVE_CRASH

|| failureReason == FAILURE_REASON_EXPLICIT_HEALTH_CHECK);

if (requiresImmediateAction) {

handleFailureImmediately(packages, failureReason);

} else {

for (int pIndex = 0; pIndex < packages.size(); pIndex++) {

VersionedPackage versionedPackage = packages.get(pIndex);

// Observer that will receive failure for versionedPackage

PackageHealthObserver currentObserverToNotify = null;

int currentObserverImpact = Integer.MAX_VALUE;

// Find observer with least user impact

for (int oIndex = 0; oIndex < mAllObservers.size(); oIndex++) {

ObserverInternal observer = mAllObservers.valueAt(oIndex);

PackageHealthObserver registeredObserver = observer.registeredObserver;

if (registeredObserver != null

&& observer.onPackageFailureLocked(

versionedPackage.getPackageName())) {

int impact = registeredObserver.onHealthCheckFailed(

versionedPackage, failureReason);

if (impact != PackageHealthObserverImpact.USER_IMPACT_NONE

&& impact < currentObserverImpact) {

currentObserverToNotify = registeredObserver;

currentObserverImpact = impact;

}

}

}

// Execute action with least user impact

if (currentObserverToNotify != null) {

currentObserverToNotify.execute(versionedPackage, failureReason);

}

}

}

}

});

}

在Crash的原因为Native_Crash和FAILURE_REASON_EXPLICIT_HEALTH_CHECK时,将会调用RollBack进行处理,但是其余的情况,将会进行进一步的通知。我们这里注意的是非RollBack的处理:

for (int pIndex = 0; pIndex < packages.size(); pIndex++) {

VersionedPackage versionedPackage = packages.get(pIndex);

// Observer that will receive failure for versionedPackage

PackageHealthObserver currentObserverToNotify = null;

int currentObserverImpact = Integer.MAX_VALUE;

// Find observer with least user impact

for (int oIndex = 0; oIndex < mAllObservers.size(); oIndex++) {

ObserverInternal observer = mAllObservers.valueAt(oIndex);

PackageHealthObserver registeredObserver = observer.registeredObserver;

if (registeredObserver != null

&& observer.onPackageFailureLocked(

versionedPackage.getPackageName())) {

int impact = registeredObserver.onHealthCheckFailed(

versionedPackage, failureReason);

if (impact != PackageHealthObserverImpact.USER_IMPACT_NONE

&& impact < currentObserverImpact) {

currentObserverToNotify = registeredObserver;

currentObserverImpact = impact;

}

}

}

// Execute action with least user impact

if (currentObserverToNotify != null) {

currentObserverToNotify.execute(versionedPackage, failureReason);

}

}

这里首先会注册PackageHealthObserver,然后调用相应的execute的函数:

// Execute action with least user impact

if (currentObserverToNotify != null) {

currentObserverToNotify.execute(versionedPackage, failureReason);

}

而我们救援模式的实现RescueParty,里面也继承并实现了PackageHealthObserver。

/**

* Handle mitigation action for package failures. This observer will be register to Package

* Watchdog and will receive calls about package failures. This observer is persistent so it

* may choose to mitigate failures for packages it has not explicitly asked to observe.

*/

public static class RescuePartyObserver implements PackageHealthObserver {

@Override

public boolean execute(@Nullable VersionedPackage failedPackage,

@FailureReasons int failureReason) {

if (isDisabled()) {

return false;

}

if (failureReason == PackageWatchdog.FAILURE_REASON_APP_CRASH

|| failureReason == PackageWatchdog.FAILURE_REASON_APP_NOT_RESPONDING) {

int triggerUid = getPackageUid(mContext, failedPackage.getPackageName());

incrementRescueLevel(triggerUid);

executeRescueLevel(mContext,

failedPackage == null ? null : failedPackage.getPackageName());

return true;

} else {

return false;

}

}

}

incrementRescueLevel的实现主要是去调整救援的等级;

executeRescueLevel是去执行救援操作

/**

* Escalate to the next rescue level. After incrementing the level you'll

* probably want to call {@link #executeRescueLevel(Context, String)}.

*/

private static void incrementRescueLevel(int triggerUid) {

final int level = getNextRescueLevel();

SystemProperties.set(PROP_RESCUE_LEVEL, Integer.toString(level));

EventLogTags.writeRescueLevel(level, triggerUid);

logCriticalInfo(Log.WARN, "Incremented rescue level to "

+ levelToString(level) + " triggered by UID " + triggerUid);

}

incrementRescueLevel是去调用getNextRescueLevel来进行计数;

/**

* Get the next rescue level. This indicates the next level of mitigation that may be taken.

*/

private static int getNextRescueLevel() {

return MathUtils.constrain(SystemProperties.getInt(PROP_RESCUE_LEVEL, LEVEL_NONE) + 1,

LEVEL_NONE, LEVEL_FACTORY_RESET);

}

实现原理也很简单,每次对于计数+1.

private static void executeRescueLevel(Context context, @Nullable String failedPackage) {

final int level = SystemProperties.getInt(PROP_RESCUE_LEVEL, LEVEL_NONE);

if (level == LEVEL_NONE) return;

Slog.w(TAG, "Attempting rescue level " + levelToString(level));

try {

executeRescueLevelInternal(context, level, failedPackage);

EventLogTags.writeRescueSuccess(level);

logCriticalInfo(Log.DEBUG,

"Finished rescue level " + levelToString(level));

} catch (Throwable t) {

logRescueException(level, t);

}

}

executeRescueLevel函数则是将当前的level和failedPackage进行传递,到executeRescueLevelInternal进行实现。

private static void executeRescueLevelInternal(Context context, int level, @Nullable

String failedPackage) throws Exception {

FrameworkStatsLog.write(FrameworkStatsLog.RESCUE_PARTY_RESET_REPORTED, level);

switch (level) {

case LEVEL_RESET_SETTINGS_UNTRUSTED_DEFAULTS:

resetAllSettings(context, Settings.RESET_MODE_UNTRUSTED_DEFAULTS, failedPackage);

break;

case LEVEL_RESET_SETTINGS_UNTRUSTED_CHANGES:

resetAllSettings(context, Settings.RESET_MODE_UNTRUSTED_CHANGES, failedPackage);

break;

case LEVEL_RESET_SETTINGS_TRUSTED_DEFAULTS:

resetAllSettings(context, Settings.RESET_MODE_TRUSTED_DEFAULTS, failedPackage);

break;

case LEVEL_FACTORY_RESET:

// Request the reboot from a separate thread to avoid deadlock on PackageWatchdog

// when device shutting down.

Runnable runnable = new Runnable() {

@Override

public void run() {

try {

RecoverySystem.rebootPromptAndWipeUserData(context, TAG);

} catch (Throwable t) {

logRescueException(level, t);

}

}

};

Thread thread = new Thread(runnable);

thread.start();

break;

}

}

在FactoryReset之前,进行的都是resetAllSettings的操作。

private static void resetAllSettings(Context context, int mode, @Nullable String failedPackage)

throws Exception {

// Try our best to reset all settings possible, and once finished

// rethrow any exception that we encountered

Exception res = null;

final ContentResolver resolver = context.getContentResolver();

try {

resetDeviceConfig(context, mode, failedPackage);

} catch (Exception e) {

res = new RuntimeException("Failed to reset config settings", e);

}

try {

Settings.Global.resetToDefaultsAsUser(resolver, null, mode, UserHandle.USER_SYSTEM);

} catch (Exception e) {

res = new RuntimeException("Failed to reset global settings", e);

}

for (int userId : getAllUserIds()) {

try {

Settings.Secure.resetToDefaultsAsUser(resolver, null, mode, userId);

} catch (Exception e) {

res = new RuntimeException("Failed to reset secure settings for " + userId, e);

}

}

if (res != null) {

throw res;

}

}

系统Factory Reset级别救援实现

当触发FactoryReset的条件时, 也就是到达五次的时候,会进入下面的操作:

// Request the reboot from a separate thread to avoid deadlock on PackageWatchdog

// when device shutting down.

Runnable runnable = new Runnable() {

@Override

public void run() {

try {

RecoverySystem.rebootPromptAndWipeUserData(context, TAG);

} catch (Throwable t) {

logRescueException(level, t);

}

}

};

Thread thread = new Thread(runnable);

thread.start();

break;

将会调用RecoverySystem.rebootPromptAndWipeUserData来进行FactoryReset的操作。

也就是进入Factory Reset的界面了。

相关文章