cryptfs: kill processes more quickly in wait_and_unmount()

In wait_and_unmount(), kill the processes with open files after umount() has been failing for 2 seconds rather than 17 seconds. This avoids a long boot delay on devices that use FDE. Detailed explanation: On FDE devices, vold needs to unmount the tmpfs /data in order to mount the real, decrypted /data. On first boot, it also needs to unmount the unencrypted /data in order to encrypt it in-place. /data can't be unmounted if files are open inside it. In theory, init is responsible for killing all processes with open files in /data, via the property trigger "vold.decrypt=trigger_shutdown_framework". However, years ago, commit 6e8440fd50 ("cryptfs: kill processes with open files on tmpfs /data") added a fallback where vold kills the processes itself. Since then, in practice people have increasingly been relying on this fallback, as services keep being added that use /data but don't get stopped by trigger_shutdown_framework. This is slowing down boot, as vold sleeps for 17 seconds before it actually kills the processes. The problematic services include services that are now started explicitly in the post-fs-data trigger rather than implicitly as part of a class (e.g., tombstoned), as well as services that now need to be started as part of one of the early-boot classes like core or early_hal but can still open files in /data later (e.g. keystore2 and credstore). Another complication is that on default-encrypted devices (devices with no PIN/pattern/password), trigger_shutdown_framework isn't run at all, but rather it's expected that the relevant services simply weren't started yet. This means that we can't fix the problem just by fixing trigger_shutdown_framework to kill all the needed processes. Therefore, given that the vold fallback is being relied on in practice, and FDE won't be supported much longer anyway (so simple fixes are very much preferable here), let's just change wait_and_unmount() in vold to use more appropriate timeouts. Instead of waiting for 17 seconds before killing processes, just wait for 2 seconds. Keep the total timeout of 20 seconds, but spend most of it retrying killing the processes, and only if the unmount is still failing. This avoids the long boot delays in practice. Bug: 187231646 Bug: 186165644 Test: Tested FDE on Cuttlefish, and checked logcat to verify that the boot delay is gone. Change-Id: Id06a9615a87988c8336396c49ee914b35f8d585b (cherry picked from commit b4faeb8d444611a6e49b6e3d1364620bd02f22df) Bug: 189250652 Merged-In: Id06a9615a87988c8336396c49ee914b35f8d585b
3 years ago · 3461ff5c9a
parent 6d5b7f7303
commit 3461ff5c9a
1 changed files with 17 additions and 15 deletions
--- a/cryptfs.cpp
+++ b/cryptfs.cpp
@ -263,8 +263,6 @@ struct crypt_persist_data {
    struct crypt_persist_entry persist_entry[0];
 };

-static int wait_and_unmount(const char* mountpoint, bool kill);
-
 typedef int (*kdf_func)(const char* passwd, const unsigned char* salt, unsigned char* ikey,
                        void* params);

@ -1751,7 +1749,7 @@ static void ensure_subdirectory_unmounted(const char *prefix) {
    }
 }

-static int wait_and_unmount(const char* mountpoint, bool kill) {
+static int wait_and_unmount(const char* mountpoint) {
    int i, err, rc;

    // Subdirectory mount will cause a failure of umount.
@ -1773,15 +1771,19 @@ static int wait_and_unmount(const char* mountpoint, bool kill) {

        err = errno;

-        /* If allowed, be increasingly aggressive before the last 2 seconds */
-        if (kill) {
-            if (i == (WAIT_UNMOUNT_COUNT - 30)) {
-                SLOGW("sending SIGHUP to processes with open files\n");
-                android::vold::KillProcessesWithOpenFiles(mountpoint, SIGTERM);
-            } else if (i == (WAIT_UNMOUNT_COUNT - 20)) {
-                SLOGW("sending SIGKILL to processes with open files\n");
-                android::vold::KillProcessesWithOpenFiles(mountpoint, SIGKILL);
-            }
+        // If it's taking too long, kill the processes with open files.
+        //
+        // Originally this logic was only a fail-safe, but now it's relied on to
+        // kill certain processes that aren't stopped by init because they
+        // aren't in the main or late_start classes.  So to avoid waiting for
+        // too long, we now are fairly aggressive in starting to kill processes.
+        static_assert(WAIT_UNMOUNT_COUNT >= 4);
+        if (i == 2) {
+            SLOGW("sending SIGTERM to processes with open files\n");
+            android::vold::KillProcessesWithOpenFiles(mountpoint, SIGTERM);
+        } else if (i >= 3) {
+            SLOGW("sending SIGKILL to processes with open files\n");
+            android::vold::KillProcessesWithOpenFiles(mountpoint, SIGKILL);
        }

        usleep(100000);
@ -1927,7 +1929,7 @@ static int cryptfs_restart_internal(int restart_main) {
         SLOGE("fs_crypto_blkdev not set\n");
         return -1;
    }
-    if (!(rc = wait_and_unmount(DATA_MNT_POINT, true))) {
+    if (!(rc = wait_and_unmount(DATA_MNT_POINT))) {
 #endif
 #else
    crypto_blkdev = android::base::GetProperty("ro.crypto.fs_crypto_blkdev", "");
@ -1936,7 +1938,7 @@ static int cryptfs_restart_internal(int restart_main) {
        return -1;
    }

-    if (!(rc = wait_and_unmount(DATA_MNT_POINT, true))) {
+    if (!(rc = wait_and_unmount(DATA_MNT_POINT))) {
 #endif
        /* If ro.crypto.readonly is set to 1, mount the decrypted
         * filesystem readonly.  This is used when /data is mounted by
@ -2759,7 +2761,7 @@ int cryptfs_enable_internal(int crypt_type, const char* passwd, int no_ui) {
         * /data, set a property saying we're doing inplace encryption,
         * and restart the framework.
         */
-        wait_and_unmount(DATA_MNT_POINT, true);
+        wait_and_unmount(DATA_MNT_POINT);
        if (fs_mgr_do_tmpfs_mount(DATA_MNT_POINT)) {
            goto error_shutting_down;
        }