big-one Udiff usr/src/uts/common/fs/zfs/spa

Print this page

NEX-19592 zfs_dbgmsg should not contain info calculated latency
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Evan Layton <evan.layton@nexenta.com>
Reviewed by: Rick McNeal <rick.mcneal@nexenta.com>
NEX-17348 The ZFS deadman timer is currently set too high
Reviewed by: Evan Layton <evan.layton@nexenta.com>
Reviewed by: Rob Gittins  <rob.gittins@nexenta.com>
Reviewed by: Joyce McIntosh<joyce.macintosh@nexenta.com>
NEX-9200 Improve the scalability of attribute locking in zfs_zget
Reviewed by: Joyce McIntosh <joyce.mcintosh@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
NEX-13140 DVA-throttle support for special-class
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
NEX-9989 Changing volume names can result in double imports and data corruption
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
NEX-10069 ZFS_READONLY is a little too strict (fix test lint)
NEX-9553 Move ss_fill gap logic from scan algorithm into range_tree.c
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
NEX-6088 ZFS scrub/resilver take excessively long due to issuing lots of random IO
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
NEX-5856 ddt_capped isn't reset when deduped dataset is destroyed
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
NEX-5553 ZFS auto-trim, manual-trim and scrub can race and deadlock
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Rob Gittins <rob.gittins@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
NEX-5795 Rename 'wrc' as 'wbc' in the source and in the tech docs
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-5064 On-demand trim should store operation start and stop time
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-5188 Removed special-vdev causes panic on read or on get size of special-bp
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
NEX-5186 smf-tests contains built files and it shouldn't
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Steve Peng <steve.peng@nexenta.com>
NEX-5168 cleanup and productize non-default latency based writecache load-balancer
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-3729 KRRP changes mess up iostat(1M)
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
NEX-4807 writecache load-balancing statistics: several distinct problems, must be revisited and revised
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-4876 On-demand TRIM shouldn't use system_taskq and should queue jobs
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-4683 WRC: Special block pointer must know that it is special
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
NEX-4677 Fix for NEX-4619 build breakage
NEX-4620 ZFS autotrim triggering is unreliable
NEX-4622 On-demand TRIM code illogically enumerates metaslabs via mg_ms_tree
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Reviewed by: Hans Rosenfeld <hans.rosenfeld@nexenta.com>
NEX-4619 Want kstats to monitor TRIM and UNMAP operation
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Hans Rosenfeld <hans.rosenfeld@nexenta.com>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
4185 add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R (fix studio build)
4185 add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Approved by: Garrett D'Amore <garrett@damore.org>
5818 zfs {ref}compressratio is incorrect with 4k sector size
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Richard Elling <richard.elling@richardelling.com>
Reviewed by: Steven Hartland <killing@multiplay.co.uk>
Reviewed by: Don Brady <dev.fs.zfs@gmail.com>
Approved by: Albert Lee <trisk@omniti.com>
NEX-4476 WRC: Allow to use write back cache per tree of datasets
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
Revert "NEX-4476 WRC: Allow to use write back cache per tree of datasets"
This reverts commit fe97b74444278a6f36fec93179133641296312da.
NEX-4476 WRC: Allow to use write back cache per tree of datasets
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
NEX-4245 WRC: Code cleanup and refactoring to simplify merge with upstream
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
NEX-4203 spa_config_tryenter incorrectly handles the multiple-lock case
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
NEX-3965 System may panic on the importing of pool with WRC
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Revert "NEX-3965 System may panic on the importing of pool with WRC"
This reverts commit 45bc50222913cddafde94621d28b78d6efaea897.
NEX-3984 On-demand TRIM
Reviewed by: Alek Pinchuk <alek@nexenta.com>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Conflicts:
        usr/src/common/zfs/zpool_prop.c
        usr/src/uts/common/sys/fs/zfs.h
NEX-3965 System may panic on the importing of pool with WRC
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
NEX-3558 KRRP Integration
NEX-3508 CLONE - Port NEX-2946 Add UNMAP/TRIM functionality to ZFS and illumos
Reviewed by: Josef Sipek <josef.sipek@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Conflicts:
    usr/src/uts/common/io/scsi/targets/sd.c
    usr/src/uts/common/sys/scsi/targets/sddef.h
NEX-3165 need some dedup improvements
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
4391 panic system rather than corrupting pool if we hit bug 4390
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Approved by: Gordon Ross <gwr@nexenta.com>
4370 avoid transmitting holes during zfs send
4371 DMU code clean up
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net>
Approved by: Garrett D'Amore <garrett@damore.org>
OS-114 Heap leak when exporting/destroying pools with CoS
SUP-577 deadlock between zpool detach and syseventd
OS-80 support for vdev and CoS properties for the new I/O scheduler
OS-95 lint warning introduced by OS-61
Fixup merge results
re #13333 rb4362 - eliminated spa_update_iotime() to fix the stats
re #12643 rb4064 ZFS meta refactoring - vdev utilization tracking, auto-dedup
re #12585 rb4049 ZFS++ work port - refactoring to improve separation of open/closed code, bug fixes, performance improvements - open code
re #8346 rb2639 KT disk failures
Bug 11205: add missing libzfs_closed_stubs.c to fix opensource-only build.
ZFS plus work: special vdevs, cos, cos/vdev properties

@@ -19,12 +19,12 @@
  * CDDL HEADER END
  */
 /*
  * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
  * Copyright (c) 2011, 2017 by Delphix. All rights reserved.
- * Copyright 2015 Nexenta Systems, Inc.  All rights reserved.
  * Copyright (c) 2014 Spectra Logic Corporation, All rights reserved.
+ * Copyright 2019 Nexenta Systems, Inc.  All rights reserved.
  * Copyright 2013 Saso Kiselkov. All rights reserved.
  * Copyright (c) 2014 Integros [integros.com]
  * Copyright (c) 2017 Datto Inc.
  */

@@ -50,10 +50,11 @@
 #include <sys/dsl_scan.h>
 #include <sys/fs/zfs.h>
 #include <sys/metaslab_impl.h>
 #include <sys/arc.h>
 #include <sys/ddt.h>
+#include <sys/cos.h>
 #include "zfs_prop.h"
 #include <sys/zfeature.h>
 
 /*
  * SPA locking

@@ -224,10 +225,18 @@
  *
  * spa_rename() is also implemented within this file since it requires
  * manipulation of the namespace.
  */
 
+struct spa_trimstats {
+        kstat_named_t   st_extents;             /* # of extents issued to zio */
+        kstat_named_t   st_bytes;               /* # of bytes issued to zio */
+        kstat_named_t   st_extents_skipped;     /* # of extents too small */
+        kstat_named_t   st_bytes_skipped;       /* bytes in extents_skipped */
+        kstat_named_t   st_auto_slow;           /* trim slow, exts dropped */
+};
+
 static avl_tree_t spa_namespace_avl;
 kmutex_t spa_namespace_lock;
 static kcondvar_t spa_namespace_cv;
 static int spa_active_count;
 int spa_max_replication_override = SPA_DVAS_PER_BP;

@@ -239,19 +248,19 @@
 
 kmem_cache_t *spa_buffer_pool;
 int spa_mode_global;
 
 #ifdef ZFS_DEBUG
-/*
- * Everything except dprintf, spa, and indirect_remap is on by default
- * in debug builds.
- */
-int zfs_flags = ~(ZFS_DEBUG_DPRINTF | ZFS_DEBUG_SPA | ZFS_DEBUG_INDIRECT_REMAP);
+/* Everything except dprintf and spa is on by default in debug builds */
+int zfs_flags = ~(ZFS_DEBUG_DPRINTF | ZFS_DEBUG_SPA);
 #else
 int zfs_flags = 0;
 #endif
 
+#define ZFS_OBJ_MTX_DEFAULT_SZ  64
+uint64_t spa_obj_mtx_sz = ZFS_OBJ_MTX_DEFAULT_SZ;
+
 /*
  * zfs_recover can be set to nonzero to attempt to recover from
  * otherwise-fatal errors, typically caused by on-disk corruption.  When
  * set, calls to zfs_panic_recover() will turn into warning messages.
  * This should only be used as a last resort, as it typically results

@@ -289,18 +298,24 @@
  * leaking space in the "partial temporary" failure case.
  */
 boolean_t zfs_free_leak_on_eio = B_FALSE;
 
 /*
+ * alpha for spa_update_latency() rolling average of pool latency, which
+ * is updated on every txg commit.
+ */
+int64_t zfs_root_latency_alpha = 10;
+
+/*
  * Expiration time in milliseconds. This value has two meanings. First it is
  * used to determine when the spa_deadman() logic should fire. By default the
- * spa_deadman() will fire if spa_sync() has not completed in 1000 seconds.
+ * spa_deadman() will fire if spa_sync() has not completed in 250 seconds.
  * Secondly, the value determines if an I/O is considered "hung". Any I/O that
  * has not completed in zfs_deadman_synctime_ms is considered "hung" resulting
  * in a system panic.
  */
-uint64_t zfs_deadman_synctime_ms = 1000000ULL;
+uint64_t zfs_deadman_synctime_ms = 250000ULL;
 
 /*
  * Check time in milliseconds. This defines the frequency at which we check
  * for hung I/O.
  */

@@ -352,40 +367,13 @@
  * See also the comments in zfs_space_check_t.
  */
 int spa_slop_shift = 5;
 uint64_t spa_min_slop = 128 * 1024 * 1024;
 
-/*PRINTFLIKE2*/
-void
-spa_load_failed(spa_t *spa, const char *fmt, ...)
-{
-        va_list adx;
-        char buf[256];
+static void spa_trimstats_create(spa_t *spa);
+static void spa_trimstats_destroy(spa_t *spa);
 
-        va_start(adx, fmt);
-        (void) vsnprintf(buf, sizeof (buf), fmt, adx);
-        va_end(adx);
-
-        zfs_dbgmsg("spa_load(%s, config %s): FAILED: %s", spa->spa_name,
-            spa->spa_trust_config ? "trusted" : "untrusted", buf);
-}
-
-/*PRINTFLIKE2*/
-void
-spa_load_note(spa_t *spa, const char *fmt, ...)
-{
-        va_list adx;
-        char buf[256];
-
-        va_start(adx, fmt);
-        (void) vsnprintf(buf, sizeof (buf), fmt, adx);
-        va_end(adx);
-
-        zfs_dbgmsg("spa_load(%s, config %s): %s", spa->spa_name,
-            spa->spa_trust_config ? "trusted" : "untrusted", buf);
-}
-
 /*
  * ==========================================================================
  * SPA config locking
  * ==========================================================================
  */

@@ -474,11 +462,11 @@
                         scl->scl_writer = curthread;
                 }
                 (void) refcount_add(&scl->scl_count, tag);
                 mutex_exit(&scl->scl_lock);
         }
-        ASSERT3U(wlocks_held, <=, locks);
+        ASSERT(wlocks_held <= locks);
 }
 
 void
 spa_config_exit(spa_t *spa, int locks, void *tag)
 {

@@ -585,10 +573,11 @@
 {
         spa_t *spa;
         spa_config_dirent_t *dp;
         cyc_handler_t hdlr;
         cyc_time_t when;
+        uint64_t guid;
 
         ASSERT(MUTEX_HELD(&spa_namespace_lock));
 
         spa = kmem_zalloc(sizeof (spa_t), KM_SLEEP);

@@ -602,17 +591,25 @@
         mutex_init(&spa->spa_cksum_tmpls_lock, NULL, MUTEX_DEFAULT, NULL);
         mutex_init(&spa->spa_scrub_lock, NULL, MUTEX_DEFAULT, NULL);
         mutex_init(&spa->spa_suspend_lock, NULL, MUTEX_DEFAULT, NULL);
         mutex_init(&spa->spa_vdev_top_lock, NULL, MUTEX_DEFAULT, NULL);
         mutex_init(&spa->spa_iokstat_lock, NULL, MUTEX_DEFAULT, NULL);
-        mutex_init(&spa->spa_alloc_lock, NULL, MUTEX_DEFAULT, NULL);
+        mutex_init(&spa->spa_cos_props_lock, NULL, MUTEX_DEFAULT, NULL);
+        mutex_init(&spa->spa_vdev_props_lock, NULL, MUTEX_DEFAULT, NULL);
+        mutex_init(&spa->spa_perfmon.perfmon_lock, NULL, MUTEX_DEFAULT, NULL);
 
+        mutex_init(&spa->spa_auto_trim_lock, NULL, MUTEX_DEFAULT, NULL);
+        mutex_init(&spa->spa_man_trim_lock, NULL, MUTEX_DEFAULT, NULL);
+
         cv_init(&spa->spa_async_cv, NULL, CV_DEFAULT, NULL);
         cv_init(&spa->spa_evicting_os_cv, NULL, CV_DEFAULT, NULL);
         cv_init(&spa->spa_proc_cv, NULL, CV_DEFAULT, NULL);
         cv_init(&spa->spa_scrub_io_cv, NULL, CV_DEFAULT, NULL);
         cv_init(&spa->spa_suspend_cv, NULL, CV_DEFAULT, NULL);
+        cv_init(&spa->spa_auto_trim_done_cv, NULL, CV_DEFAULT, NULL);
+        cv_init(&spa->spa_man_trim_update_cv, NULL, CV_DEFAULT, NULL);
+        cv_init(&spa->spa_man_trim_done_cv, NULL, CV_DEFAULT, NULL);
 
         for (int t = 0; t < TXG_SIZE; t++)
                 bplist_create(&spa->spa_free_bplist[t]);
 
         (void) strlcpy(spa->spa_name, name, sizeof (spa->spa_name));

@@ -620,12 +617,27 @@
         spa->spa_freeze_txg = UINT64_MAX;
         spa->spa_final_txg = UINT64_MAX;
         spa->spa_load_max_txg = UINT64_MAX;
         spa->spa_proc = &p0;
         spa->spa_proc_state = SPA_PROC_NONE;
-        spa->spa_trust_config = B_TRUE;
+        if (spa_obj_mtx_sz < 1 || spa_obj_mtx_sz > INT_MAX)
+                spa->spa_obj_mtx_sz = ZFS_OBJ_MTX_DEFAULT_SZ;
+        else
+                spa->spa_obj_mtx_sz = spa_obj_mtx_sz;
 
+        /*
+         * Grabbing the guid here is just so that spa_config_guid_exists can
+         * check early on to protect against doubled imports of the same pool
+         * under different names. If the GUID isn't provided here, we will
+         * let spa generate one later on during spa_load, although in that
+         * case we might not be able to provide the double-import protection.
+         */
+        if (nvlist_lookup_uint64(config, ZPOOL_CONFIG_POOL_GUID, &guid) == 0) {
+                spa->spa_config_guid = guid;
+                ASSERT(!spa_config_guid_exists(guid));
+        }
+
         hdlr.cyh_func = spa_deadman;
         hdlr.cyh_arg = spa;
         hdlr.cyh_level = CY_LOW_LEVEL;
 
         spa->spa_deadman_synctime = MSEC2NSEC(zfs_deadman_synctime_ms);

@@ -653,13 +665,10 @@
         if (altroot) {
                 spa->spa_root = spa_strdup(altroot);
                 spa_active_count++;
         }
 
-        avl_create(&spa->spa_alloc_tree, zio_bookmark_compare,
-            sizeof (zio_t), offsetof(zio_t, io_alloc_node));
-
         /*
          * Every pool starts with the default cachefile
          */
         list_create(&spa->spa_config_list, sizeof (spa_config_dirent_t),
             offsetof(spa_config_dirent_t, scd_link));

@@ -687,20 +696,29 @@
                 VERIFY(nvlist_alloc(&spa->spa_label_features, NV_UNIQUE_NAME,
                     KM_SLEEP) == 0);
         }
 
         spa->spa_iokstat = kstat_create("zfs", 0, name,
-            "disk", KSTAT_TYPE_IO, 1, 0);
+            "zfs", KSTAT_TYPE_IO, 1, 0);
         if (spa->spa_iokstat) {
                 spa->spa_iokstat->ks_lock = &spa->spa_iokstat_lock;
                 kstat_install(spa->spa_iokstat);
         }
 
+        spa_trimstats_create(spa);
+
         spa->spa_debug = ((zfs_flags & ZFS_DEBUG_SPA) != 0);
 
+        autosnap_init(spa);
+
+        spa_cos_init(spa);
+
+        spa_special_init(spa);
+
         spa->spa_min_ashift = INT_MAX;
         spa->spa_max_ashift = 0;
+        wbc_init(&spa->spa_wbc, spa);
 
         /*
          * As a pool is being created, treat all features as disabled by
          * setting SPA_FEATURE_DISABLED for all entries in the feature
          * refcount cache.

@@ -741,13 +759,20 @@
                 if (dp->scd_path != NULL)
                         spa_strfree(dp->scd_path);
                 kmem_free(dp, sizeof (spa_config_dirent_t));
         }
 
-        avl_destroy(&spa->spa_alloc_tree);
         list_destroy(&spa->spa_config_list);
 
+        wbc_fini(&spa->spa_wbc);
+
+        spa_special_fini(spa);
+
+        spa_cos_fini(spa);
+
+        autosnap_fini(spa);
+
         nvlist_free(spa->spa_label_features);
         nvlist_free(spa->spa_load_info);
         spa_config_set(spa, NULL);
 
         mutex_enter(&cpu_lock);

@@ -758,10 +783,12 @@
 
         refcount_destroy(&spa->spa_refcount);
 
         spa_config_lock_destroy(spa);
 
+        spa_trimstats_destroy(spa);
+
         kstat_delete(spa->spa_iokstat);
         spa->spa_iokstat = NULL;
 
         for (int t = 0; t < TXG_SIZE; t++)
                 bplist_destroy(&spa->spa_free_bplist[t]);

@@ -771,12 +798,14 @@
         cv_destroy(&spa->spa_async_cv);
         cv_destroy(&spa->spa_evicting_os_cv);
         cv_destroy(&spa->spa_proc_cv);
         cv_destroy(&spa->spa_scrub_io_cv);
         cv_destroy(&spa->spa_suspend_cv);
+        cv_destroy(&spa->spa_auto_trim_done_cv);
+        cv_destroy(&spa->spa_man_trim_update_cv);
+        cv_destroy(&spa->spa_man_trim_done_cv);
 
-        mutex_destroy(&spa->spa_alloc_lock);
         mutex_destroy(&spa->spa_async_lock);
         mutex_destroy(&spa->spa_errlist_lock);
         mutex_destroy(&spa->spa_errlog_lock);
         mutex_destroy(&spa->spa_evicting_os_lock);
         mutex_destroy(&spa->spa_history_lock);

@@ -785,10 +814,14 @@
         mutex_destroy(&spa->spa_cksum_tmpls_lock);
         mutex_destroy(&spa->spa_scrub_lock);
         mutex_destroy(&spa->spa_suspend_lock);
         mutex_destroy(&spa->spa_vdev_top_lock);
         mutex_destroy(&spa->spa_iokstat_lock);
+        mutex_destroy(&spa->spa_cos_props_lock);
+        mutex_destroy(&spa->spa_vdev_props_lock);
+        mutex_destroy(&spa->spa_auto_trim_lock);
+        mutex_destroy(&spa->spa_man_trim_lock);
 
         kmem_free(spa, sizeof (spa_t));
 }
 
 /*

@@ -1108,10 +1141,13 @@
 uint64_t
 spa_vdev_enter(spa_t *spa)
 {
         mutex_enter(&spa->spa_vdev_top_lock);
         mutex_enter(&spa_namespace_lock);
+        mutex_enter(&spa->spa_auto_trim_lock);
+        mutex_enter(&spa->spa_man_trim_lock);
+        spa_trim_stop_wait(spa);
         return (spa_vdev_config_enter(spa));
 }
 
 /*
  * Internal implementation for spa_vdev_enter().  Used when a vdev

@@ -1156,10 +1192,11 @@
         /*
          * Verify the metaslab classes.
          */
         ASSERT(metaslab_class_validate(spa_normal_class(spa)) == 0);
         ASSERT(metaslab_class_validate(spa_log_class(spa)) == 0);
+        ASSERT(metaslab_class_validate(spa_special_class(spa)) == 0);
 
         spa_config_exit(spa, SCL_ALL, spa);
 
         /*
          * Panic the system if the specified tag requires it.  This

@@ -1186,11 +1223,11 @@
 
         /*
          * If the config changed, update the config cache.
          */
         if (config_changed)
-                spa_write_cachefile(spa, B_FALSE, B_TRUE);
+                spa_config_sync(spa, B_FALSE, B_TRUE);
 }
 
 /*
  * Unlock the spa_t after adding or removing a vdev.  Besides undoing the
  * locking of spa_vdev_enter(), we also want make sure the transactions have

@@ -1199,10 +1236,12 @@
  */
 int
 spa_vdev_exit(spa_t *spa, vdev_t *vd, uint64_t txg, int error)
 {
         spa_vdev_config_exit(spa, vd, txg, error, FTAG);
+        mutex_exit(&spa->spa_man_trim_lock);
+        mutex_exit(&spa->spa_auto_trim_lock);
         mutex_exit(&spa_namespace_lock);
         mutex_exit(&spa->spa_vdev_top_lock);
 
         return (error);
 }

@@ -1270,11 +1309,11 @@
         /*
          * If the config changed, update the config cache.
          */
         if (config_changed) {
                 mutex_enter(&spa_namespace_lock);
-                spa_write_cachefile(spa, B_FALSE, B_TRUE);
+                spa_config_sync(spa, B_FALSE, B_TRUE);
                 mutex_exit(&spa_namespace_lock);
         }
 
         return (error);
 }

@@ -1348,11 +1387,11 @@
         txg_wait_synced(spa->spa_dsl_pool, 0);
 
         /*
          * Sync the updated config cache.
          */
-        spa_write_cachefile(spa, B_FALSE, B_TRUE);
+        spa_config_sync(spa, B_FALSE, B_TRUE);
 
         spa_close(spa, FTAG);
 
         mutex_exit(&spa_namespace_lock);

@@ -1406,10 +1445,39 @@
 spa_guid_exists(uint64_t pool_guid, uint64_t device_guid)
 {
         return (spa_by_guid(pool_guid, device_guid) != NULL);
 }
 
+/*
+ * Similar to spa_guid_exists, but uses the spa_config_guid and doesn't
+ * filter the check by pool state (as spa_guid_exists does). This is
+ * used to protect against attempting to spa_add the same pool (with the
+ * same pool GUID) under different names. This situation can happen if
+ * the boot_archive contains an outdated zpool.cache file after a pool
+ * rename. That would make us import the pool twice, resulting in data
+ * corruption. Normally the boot_archive shouldn't contain a zpool.cache
+ * file, but if due to misconfiguration it does, this function serves as
+ * a failsafe to prevent the double import.
+ */
+boolean_t
+spa_config_guid_exists(uint64_t pool_guid)
+{
+        spa_t *spa;
+
+        ASSERT(MUTEX_HELD(&spa_namespace_lock));
+        if (pool_guid == 0)
+                return (B_FALSE);
+
+        for (spa = avl_first(&spa_namespace_avl); spa != NULL;
+            spa = AVL_NEXT(&spa_namespace_avl, spa)) {
+                if (spa->spa_config_guid == pool_guid)
+                        return (B_TRUE);
+        }
+
+        return (B_FALSE);
+}
+
 char *
 spa_strdup(const char *s)
 {
         size_t len;
         char *new;

@@ -1564,16 +1632,10 @@
 spa_is_initializing(spa_t *spa)
 {
         return (spa->spa_is_initializing);
 }
 
-boolean_t
-spa_indirect_vdevs_loaded(spa_t *spa)
-{
-        return (spa->spa_indirect_vdevs_loaded);
-}
-
 blkptr_t *
 spa_get_rootblkptr(spa_t *spa)
 {
         return (&spa->spa_ubsync.ub_rootbp);
 }

@@ -1696,10 +1758,54 @@
 {
         return (lsize * spa_asize_inflation);
 }
 
 /*
+ * Get either on disk (phys == B_TRUE) or possible in core DDT size
+ */
+uint64_t
+spa_get_ddts_size(spa_t *spa, boolean_t phys)
+{
+        if (phys)
+                return (spa->spa_ddt_dsize);
+
+        return (spa->spa_ddt_msize);
+}
+
+/*
+ * Check to see if we need to stop DDT growth to stay within some limit
+ */
+boolean_t
+spa_enable_dedup_cap(spa_t *spa)
+{
+        if (zfs_ddt_byte_ceiling != 0) {
+                if (zfs_ddts_msize > zfs_ddt_byte_ceiling) {
+                        /* need to limit DDT to an in core bytecount */
+                        return (B_TRUE);
+                }
+        } else if (zfs_ddt_limit_type == DDT_LIMIT_TO_ARC) {
+                if (zfs_ddts_msize > *arc_ddt_evict_threshold) {
+                        /* need to limit DDT to fit into ARC */
+                        return (B_TRUE);
+                }
+        } else if (zfs_ddt_limit_type == DDT_LIMIT_TO_L2ARC) {
+                if (spa->spa_l2arc_ddt_devs_size != 0) {
+                        if (spa_get_ddts_size(spa, B_TRUE) >
+                            spa->spa_l2arc_ddt_devs_size) {
+                                /* limit DDT to fit into L2ARC DDT dev */
+                                return (B_TRUE);
+                        }
+                } else if (zfs_ddts_msize > *arc_ddt_evict_threshold) {
+                        /* no L2ARC DDT dev - limit DDT to fit into ARC */
+                        return (B_TRUE);
+                }
+        }
+
+        return (B_FALSE);
+}
+
+/*
  * Return the amount of slop space in bytes.  It is 1/32 of the pool (3.2%),
  * or at least 128MB, unless that would cause it to be more than half the
  * pool size.
  *
  * See the comment above spa_slop_shift for details.

@@ -1720,30 +1826,55 @@
 void
 spa_update_dspace(spa_t *spa)
 {
         spa->spa_dspace = metaslab_class_get_dspace(spa_normal_class(spa)) +
             ddt_get_dedup_dspace(spa);
-        if (spa->spa_vdev_removal != NULL) {
+}
+
+/*
+ * EXPERIMENTAL
+ * Use exponential moving average to track root vdev iotime, as well as top
+ * level vdev iotime.
+ * The principle: avg_new = avg_prev + (cur - avg_prev) * a / 100; a is
+ * tuneable. For example, if a = 10 (alpha = 0.1), it will take 20 iterations,
+ * or 100 seconds at 5 second txg commit intervals for the values from last 20
+ * iterations to account for 66% of the moving average.
+ * Currently, the challenge is that we keep track of iotime in cumulative
+ * nanoseconds since zpool import, both for leaf and top vdevs, so a way of
+ * getting delta pre/post txg commit is required.
+ */
+
+void
+spa_update_latency(spa_t *spa)
+{
+        vdev_t *rvd = spa->spa_root_vdev;
+        vdev_stat_t *rvs = &rvd->vdev_stat;
+        for (int c = 0; c < rvd->vdev_children; c++) {
+                vdev_t *cvd = rvd->vdev_child[c];
+                vdev_stat_t *cvs = &cvd->vdev_stat;
+                mutex_enter(&rvd->vdev_stat_lock);
+
+                for (int t = 0; t < ZIO_TYPES; t++) {
+
                 /*
-                 * We can't allocate from the removing device, so
-                 * subtract its size.  This prevents the DMU/DSL from
-                 * filling up the (now smaller) pool while we are in the
-                 * middle of removing the device.
-                 *
-                 * Note that the DMU/DSL doesn't actually know or care
-                 * how much space is allocated (it does its own tracking
-                 * of how much space has been logically used).  So it
-                 * doesn't matter that the data we are moving may be
-                 * allocated twice (on the old device and the new
-                 * device).
+                         * Non-trivial bit here. We update the moving latency
+                         * average for each child vdev separately, but since we
+                         * want the average to settle at the same rate
+                         * regardless of top level vdev count, we effectively
+                         * divide our alpha by number of children of the root
+                         * vdev to account for that.
                  */
-                vdev_t *vd = spa->spa_vdev_removal->svr_vdev;
-                spa->spa_dspace -= spa_deflate(spa) ?
-                    vd->vdev_stat.vs_dspace : vd->vdev_stat.vs_space;
+                        rvs->vs_latency[t] += ((((int64_t)cvs->vs_latency[t] -
+                            (int64_t)rvs->vs_latency[t]) *
+                            (int64_t)zfs_root_latency_alpha) / 100) /
+                            (int64_t)(rvd->vdev_children);
         }
+                mutex_exit(&rvd->vdev_stat_lock);
+        }
 }
 
+
 /*
  * Return the failure mode that has been set to this pool. The default
  * behavior will be to block all I/Os when a complete failure occurs.
  */
 uint8_t

@@ -1762,10 +1893,16 @@
 spa_version(spa_t *spa)
 {
         return (spa->spa_ubsync.ub_version);
 }
 
+int
+spa_get_obj_mtx_sz(spa_t *spa)
+{
+        return (spa->spa_obj_mtx_sz);
+}
+
 boolean_t
 spa_deflate(spa_t *spa)
 {
         return (spa->spa_deflate);
 }

@@ -1780,10 +1917,16 @@
 spa_log_class(spa_t *spa)
 {
         return (spa->spa_log_class);
 }
 
+metaslab_class_t *
+spa_special_class(spa_t *spa)
+{
+        return (spa->spa_special_class);
+}
+
 void
 spa_evicting_os_register(spa_t *spa, objset_t *os)
 {
         mutex_enter(&spa->spa_evicting_os_lock);
         list_insert_head(&spa->spa_evicting_os_list, os);

@@ -1808,10 +1951,20 @@
         mutex_exit(&spa->spa_evicting_os_lock);
 
         dmu_buf_user_evict_wait();
 }
 
+uint64_t
+spa_class_alloc_percentage(metaslab_class_t *mc)
+{
+        uint64_t capacity = mc->mc_space;
+        uint64_t alloc = mc->mc_alloc;
+        uint64_t one_percent = capacity / 100;
+
+        return (alloc / one_percent);
+}
+
 int
 spa_max_replication(spa_t *spa)
 {
         /*
          * As of SPA_VERSION == SPA_VERSION_DITTO_BLOCKS, we are able to

@@ -1833,10 +1986,22 @@
 spa_deadman_synctime(spa_t *spa)
 {
         return (spa->spa_deadman_synctime);
 }
 
+spa_force_trim_t
+spa_get_force_trim(spa_t *spa)
+{
+        return (spa->spa_force_trim);
+}
+
+spa_auto_trim_t
+spa_get_auto_trim(spa_t *spa)
+{
+        return (spa->spa_auto_trim);
+}
+
 uint64_t
 dva_get_dsize_sync(spa_t *spa, const dva_t *dva)
 {
         uint64_t asize = DVA_GET_ASIZE(dva);
         uint64_t dsize = asize;

@@ -1849,30 +2014,41 @@
         }
 
         return (dsize);
 }
 
+/*
+ * This function walks over the all DVAs of the given BP and
+ * adds up their sizes.
+ */
 uint64_t
 bp_get_dsize_sync(spa_t *spa, const blkptr_t *bp)
 {
+        /*
+         * SPECIAL-BP has two DVAs, but DVA[0] in this case is a
+         * temporary DVA, and after migration only the DVA[1]
+         * contains valid data. Therefore, we start walking for
+         * these BPs from DVA[1].
+         */
+        int start_dva = BP_IS_SPECIAL(bp) ? 1 : 0;
         uint64_t dsize = 0;
 
-        for (int d = 0; d < BP_GET_NDVAS(bp); d++)
+        for (int d = start_dva; d < BP_GET_NDVAS(bp); d++) {
                 dsize += dva_get_dsize_sync(spa, &bp->blk_dva[d]);
+        }
 
         return (dsize);
 }
 
 uint64_t
 bp_get_dsize(spa_t *spa, const blkptr_t *bp)
 {
-        uint64_t dsize = 0;
+        uint64_t dsize;
 
         spa_config_enter(spa, SCL_VDEV, FTAG, RW_READER);
 
-        for (int d = 0; d < BP_GET_NDVAS(bp); d++)
-                dsize += dva_get_dsize_sync(spa, &bp->blk_dva[d]);
+        dsize = bp_get_dsize_sync(spa, bp);
 
         spa_config_exit(spa, SCL_VDEV, FTAG);
 
         return (dsize);
 }

@@ -1927,10 +2103,18 @@
         avl_create(&spa_l2cache_avl, spa_l2cache_compare, sizeof (spa_aux_t),
             offsetof(spa_aux_t, aux_avl));
 
         spa_mode_global = mode;
 
+        /*
+         * logevent_max_q_sz from log_sysevent.c gives us upper bound on
+         * the number of taskq entries; queueing of sysevents is serialized,
+         * so there is no need for more than one worker thread
+         */
+        spa_sysevent_taskq = taskq_create("spa_sysevent_tq", 1,
+            minclsyspri, 1, 5000, TASKQ_DYNAMIC);
+
 #ifdef _KERNEL
         spa_arch_init();
 #else
         if (spa_mode_global != FREAD && dprintf_find_string("watch")) {
                 arc_procfd = open("/proc/self/ctl", O_WRONLY);

@@ -1952,17 +2136,23 @@
         zil_init();
         vdev_cache_stat_init();
         zfs_prop_init();
         zpool_prop_init();
         zpool_feature_init();
+        vdev_prop_init();
+        cos_prop_init();
         spa_config_load();
         l2arc_start();
+        ddt_init();
+        dsl_scan_global_init();
 }
 
 void
 spa_fini(void)
 {
+        ddt_fini();
+
         l2arc_stop();
 
         spa_evict_all();
 
         vdev_cache_stat_fini();

@@ -1972,10 +2162,12 @@
         metaslab_alloc_trace_fini();
         range_tree_fini();
         unique_fini();
         refcount_fini();
 
+        taskq_destroy(spa_sysevent_taskq);
+
         avl_destroy(&spa_namespace_avl);
         avl_destroy(&spa_spare_avl);
         avl_destroy(&spa_l2cache_avl);
 
         cv_destroy(&spa_namespace_cv);

@@ -2014,11 +2206,11 @@
 }
 
 boolean_t
 spa_writeable(spa_t *spa)
 {
-        return (!!(spa->spa_mode & FWRITE) && spa->spa_trust_config);
+        return (!!(spa->spa_mode & FWRITE));
 }
 
 /*
  * Returns true if there is a pending sync task in any of the current
  * syncing txg, the current quiescing txg, or the current open txg.

@@ -2027,10 +2219,16 @@
 spa_has_pending_synctask(spa_t *spa)
 {
         return (!txg_all_lists_empty(&spa->spa_dsl_pool->dp_sync_tasks));
 }
 
+boolean_t
+spa_has_special(spa_t *spa)
+{
+        return (spa->spa_special_class->mc_rotor != NULL);
+}
+
 int
 spa_mode(spa_t *spa)
 {
         return (spa->spa_mode);
 }

@@ -2071,10 +2269,11 @@
                 spa->spa_scan_pass_scrub_pause = spa->spa_scan_pass_start;
         else
                 spa->spa_scan_pass_scrub_pause = 0;
         spa->spa_scan_pass_scrub_spent_paused = 0;
         spa->spa_scan_pass_exam = 0;
+        spa->spa_scan_pass_work = 0;
         vdev_scan_stat_init(spa->spa_root_vdev);
 }
 
 /*
  * Get scan stats for zpool status reports

@@ -2096,14 +2295,18 @@
         ps->pss_examined = scn->scn_phys.scn_examined;
         ps->pss_to_process = scn->scn_phys.scn_to_process;
         ps->pss_processed = scn->scn_phys.scn_processed;
         ps->pss_errors = scn->scn_phys.scn_errors;
         ps->pss_state = scn->scn_phys.scn_state;
+        mutex_enter(&scn->scn_status_lock);
+        ps->pss_issued = scn->scn_bytes_issued;
+        mutex_exit(&scn->scn_status_lock);
 
         /* data not stored on disk */
         ps->pss_pass_start = spa->spa_scan_pass_start;
         ps->pss_pass_exam = spa->spa_scan_pass_exam;
+        ps->pss_pass_work = spa->spa_scan_pass_work;
         ps->pss_pass_scrub_pause = spa->spa_scan_pass_scrub_pause;
         ps->pss_pass_scrub_spent_paused = spa->spa_scan_pass_scrub_spent_paused;
 
         return (0);
 }

@@ -2121,64 +2324,207 @@
                 return (SPA_MAXBLOCKSIZE);
         else
                 return (SPA_OLD_MAXBLOCKSIZE);
 }
 
+boolean_t
+spa_wbc_present(spa_t *spa)
+{
+        return (spa->spa_wbc_mode != WBC_MODE_OFF);
+}
+
+boolean_t
+spa_wbc_active(spa_t *spa)
+{
+        return (spa->spa_wbc_mode == WBC_MODE_ACTIVE);
+}
+
+int
+spa_wbc_mode(const char *name)
+{
+        int ret = 0;
+        spa_t *spa;
+
+        mutex_enter(&spa_namespace_lock);
+        spa = spa_lookup(name);
+        if (!spa) {
+                mutex_exit(&spa_namespace_lock);
+                return (-1);
+        }
+
+        ret = (int)spa->spa_wbc_mode;
+        mutex_exit(&spa_namespace_lock);
+        return (ret);
+}
+
+struct zfs_autosnap *
+spa_get_autosnap(spa_t *spa)
+{
+        return (&spa->spa_autosnap);
+}
+
+wbc_data_t *
+spa_get_wbc_data(spa_t *spa)
+{
+        return (&spa->spa_wbc);
+}
+
 /*
- * Returns the txg that the last device removal completed. No indirect mappings
- * have been added since this txg.
+ * Creates the trim kstats structure for a spa.
  */
-uint64_t
-spa_get_last_removal_txg(spa_t *spa)
+static void
+spa_trimstats_create(spa_t *spa)
 {
-        uint64_t vdevid;
-        uint64_t ret = -1ULL;
+        /* truncate pool name to accomodate "_trimstats" suffix */
+        char short_spa_name[KSTAT_STRLEN - 10];
+        char name[KSTAT_STRLEN];
 
-        spa_config_enter(spa, SCL_VDEV, FTAG, RW_READER);
-        /*
-         * sr_prev_indirect_vdev is only modified while holding all the
-         * config locks, so it is sufficient to hold SCL_VDEV as reader when
-         * examining it.
-         */
-        vdevid = spa->spa_removing_phys.sr_prev_indirect_vdev;
+        ASSERT3P(spa->spa_trimstats, ==, NULL);
+        ASSERT3P(spa->spa_trimstats_ks, ==, NULL);
 
-        while (vdevid != -1ULL) {
-                vdev_t *vd = vdev_lookup_top(spa, vdevid);
-                vdev_indirect_births_t *vib = vd->vdev_indirect_births;
+        (void) snprintf(short_spa_name, sizeof (short_spa_name), "%s",
+            spa->spa_name);
+        (void) snprintf(name, sizeof (name), "%s_trimstats", short_spa_name);
 
-                ASSERT3P(vd->vdev_ops, ==, &vdev_indirect_ops);
+        spa->spa_trimstats_ks = kstat_create("zfs", 0, name, "misc",
+            KSTAT_TYPE_NAMED, sizeof (*spa->spa_trimstats) /
+            sizeof (kstat_named_t), 0);
+        if (spa->spa_trimstats_ks) {
+                spa->spa_trimstats = spa->spa_trimstats_ks->ks_data;
 
-                /*
-                 * If the removal did not remap any data, we don't care.
+#ifdef _KERNEL
+                kstat_named_init(&spa->spa_trimstats->st_extents,
+                    "extents", KSTAT_DATA_UINT64);
+                kstat_named_init(&spa->spa_trimstats->st_bytes,
+                    "bytes", KSTAT_DATA_UINT64);
+                kstat_named_init(&spa->spa_trimstats->st_extents_skipped,
+                    "extents_skipped", KSTAT_DATA_UINT64);
+                kstat_named_init(&spa->spa_trimstats->st_bytes_skipped,
+                    "bytes_skipped", KSTAT_DATA_UINT64);
+                kstat_named_init(&spa->spa_trimstats->st_auto_slow,
+                    "auto_slow", KSTAT_DATA_UINT64);
+#endif  /* _KERNEL */
+
+                kstat_install(spa->spa_trimstats_ks);
+        } else {
+                cmn_err(CE_NOTE, "!Cannot create trim kstats for pool %s",
+                    spa->spa_name);
+        }
+}
+
+/*
+ * Destroys the trim kstats for a spa.
                  */
-                if (vdev_indirect_births_count(vib) != 0) {
-                        ret = vdev_indirect_births_last_entry_txg(vib);
-                        break;
+static void
+spa_trimstats_destroy(spa_t *spa)
+{
+        if (spa->spa_trimstats_ks) {
+                kstat_delete(spa->spa_trimstats_ks);
+                spa->spa_trimstats = NULL;
+                spa->spa_trimstats_ks = NULL;
                 }
+}
 
-                vdevid = vd->vdev_indirect_config.vic_prev_indirect_vdev;
+/*
+ * Updates the numerical trim kstats for a spa.
+ */
+void
+spa_trimstats_update(spa_t *spa, uint64_t extents, uint64_t bytes,
+    uint64_t extents_skipped, uint64_t bytes_skipped)
+{
+        spa_trimstats_t *st = spa->spa_trimstats;
+        if (st) {
+                atomic_add_64(&st->st_extents.value.ui64, extents);
+                atomic_add_64(&st->st_bytes.value.ui64, bytes);
+                atomic_add_64(&st->st_extents_skipped.value.ui64,
+                    extents_skipped);
+                atomic_add_64(&st->st_bytes_skipped.value.ui64,
+                    bytes_skipped);
         }
-        spa_config_exit(spa, SCL_VDEV, FTAG);
+}
 
-        IMPLY(ret != -1ULL,
-            spa_feature_is_active(spa, SPA_FEATURE_DEVICE_REMOVAL));
+/*
+ * Increments the slow-trim kstat for a spa.
+ */
+void
+spa_trimstats_auto_slow_incr(spa_t *spa)
+{
+        spa_trimstats_t *st = spa->spa_trimstats;
+        if (st)
+                atomic_inc_64(&st->st_auto_slow.value.ui64);
+}
 
-        return (ret);
+/*
+ * Creates the taskq used for dispatching auto-trim. This is called only when
+ * the property is set to `on' or when the pool is loaded (and the autotrim
+ * property is `on').
+ */
+void
+spa_auto_trim_taskq_create(spa_t *spa)
+{
+        char name[MAXPATHLEN];
+        ASSERT(MUTEX_HELD(&spa->spa_auto_trim_lock));
+        ASSERT(spa->spa_auto_trim_taskq == NULL);
+        (void) snprintf(name, sizeof (name), "%s_auto_trim", spa->spa_name);
+        spa->spa_auto_trim_taskq = taskq_create(name, 1, minclsyspri, 1,
+            spa->spa_root_vdev->vdev_children, TASKQ_DYNAMIC);
+        VERIFY(spa->spa_auto_trim_taskq != NULL);
 }
 
-boolean_t
-spa_trust_config(spa_t *spa)
+/*
+ * Creates the taskq for dispatching manual trim. This taskq is recreated
+ * each time `zpool trim <poolname>' is issued and destroyed after the run
+ * completes in an async spa request.
+ */
+void
+spa_man_trim_taskq_create(spa_t *spa)
 {
-        return (spa->spa_trust_config);
+        char name[MAXPATHLEN];
+        ASSERT(MUTEX_HELD(&spa->spa_man_trim_lock));
+        spa_async_unrequest(spa, SPA_ASYNC_MAN_TRIM_TASKQ_DESTROY);
+        if (spa->spa_man_trim_taskq != NULL)
+                /*
+                 * The async taskq destroy has been pre-empted, so just
+                 * return, the taskq is still good to use.
+                 */
+                return;
+        (void) snprintf(name, sizeof (name), "%s_man_trim", spa->spa_name);
+        spa->spa_man_trim_taskq = taskq_create(name, 1, minclsyspri, 1,
+            spa->spa_root_vdev->vdev_children, TASKQ_DYNAMIC);
+        VERIFY(spa->spa_man_trim_taskq != NULL);
 }
 
-uint64_t
-spa_missing_tvds_allowed(spa_t *spa)
+/*
+ * Destroys the taskq created in spa_auto_trim_taskq_create. The taskq
+ * is only destroyed when the autotrim property is set to `off'.
+ */
+void
+spa_auto_trim_taskq_destroy(spa_t *spa)
 {
-        return (spa->spa_missing_tvds_allowed);
+        ASSERT(MUTEX_HELD(&spa->spa_auto_trim_lock));
+        ASSERT(spa->spa_auto_trim_taskq != NULL);
+        while (spa->spa_num_auto_trimming != 0)
+                cv_wait(&spa->spa_auto_trim_done_cv, &spa->spa_auto_trim_lock);
+        taskq_destroy(spa->spa_auto_trim_taskq);
+        spa->spa_auto_trim_taskq = NULL;
 }
 
+/*
+ * Destroys the taskq created in spa_man_trim_taskq_create. The taskq is
+ * destroyed after a manual trim run completes from an async spa request.
+ * There is a bit of lag between an async request being issued at the
+ * completion of a trim run and it finally being acted on, hence why this
+ * function checks if new manual trimming threads haven't been re-spawned.
+ * If they have, we assume the async spa request been preempted by another
+ * manual trim request and we back off.
+ */
 void
-spa_set_missing_tvds(spa_t *spa, uint64_t missing)
+spa_man_trim_taskq_destroy(spa_t *spa)
 {
-        spa->spa_missing_tvds = missing;
+        ASSERT(MUTEX_HELD(&spa->spa_man_trim_lock));
+        ASSERT(spa->spa_man_trim_taskq != NULL);
+        if (spa->spa_num_man_trimming != 0)
+                /* another trim got started before we got here, back off */
+                return;
+        taskq_destroy(spa->spa_man_trim_taskq);
+        spa->spa_man_trim_taskq = NULL;
 }