big-one Cdiff usr/src/uts/common/fs/zfs/spa

Print this page

NEX-19592 zfs_dbgmsg should not contain info calculated latency
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Evan Layton <evan.layton@nexenta.com>
Reviewed by: Rick McNeal <rick.mcneal@nexenta.com>
NEX-17348 The ZFS deadman timer is currently set too high
Reviewed by: Evan Layton <evan.layton@nexenta.com>
Reviewed by: Rob Gittins  <rob.gittins@nexenta.com>
Reviewed by: Joyce McIntosh<joyce.macintosh@nexenta.com>
NEX-9200 Improve the scalability of attribute locking in zfs_zget
Reviewed by: Joyce McIntosh <joyce.mcintosh@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
NEX-13140 DVA-throttle support for special-class
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
NEX-9989 Changing volume names can result in double imports and data corruption
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
NEX-10069 ZFS_READONLY is a little too strict (fix test lint)
NEX-9553 Move ss_fill gap logic from scan algorithm into range_tree.c
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
NEX-6088 ZFS scrub/resilver take excessively long due to issuing lots of random IO
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
NEX-5856 ddt_capped isn't reset when deduped dataset is destroyed
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
NEX-5553 ZFS auto-trim, manual-trim and scrub can race and deadlock
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Rob Gittins <rob.gittins@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
NEX-5795 Rename 'wrc' as 'wbc' in the source and in the tech docs
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-5064 On-demand trim should store operation start and stop time
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-5188 Removed special-vdev causes panic on read or on get size of special-bp
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
NEX-5186 smf-tests contains built files and it shouldn't
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Steve Peng <steve.peng@nexenta.com>
NEX-5168 cleanup and productize non-default latency based writecache load-balancer
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-3729 KRRP changes mess up iostat(1M)
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
NEX-4807 writecache load-balancing statistics: several distinct problems, must be revisited and revised
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-4876 On-demand TRIM shouldn't use system_taskq and should queue jobs
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-4683 WRC: Special block pointer must know that it is special
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
NEX-4677 Fix for NEX-4619 build breakage
NEX-4620 ZFS autotrim triggering is unreliable
NEX-4622 On-demand TRIM code illogically enumerates metaslabs via mg_ms_tree
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Reviewed by: Hans Rosenfeld <hans.rosenfeld@nexenta.com>
NEX-4619 Want kstats to monitor TRIM and UNMAP operation
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Hans Rosenfeld <hans.rosenfeld@nexenta.com>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
4185 add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R (fix studio build)
4185 add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Approved by: Garrett D'Amore <garrett@damore.org>
5818 zfs {ref}compressratio is incorrect with 4k sector size
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Richard Elling <richard.elling@richardelling.com>
Reviewed by: Steven Hartland <killing@multiplay.co.uk>
Reviewed by: Don Brady <dev.fs.zfs@gmail.com>
Approved by: Albert Lee <trisk@omniti.com>
NEX-4476 WRC: Allow to use write back cache per tree of datasets
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
Revert "NEX-4476 WRC: Allow to use write back cache per tree of datasets"
This reverts commit fe97b74444278a6f36fec93179133641296312da.
NEX-4476 WRC: Allow to use write back cache per tree of datasets
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
NEX-4245 WRC: Code cleanup and refactoring to simplify merge with upstream
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
NEX-4203 spa_config_tryenter incorrectly handles the multiple-lock case
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
NEX-3965 System may panic on the importing of pool with WRC
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Revert "NEX-3965 System may panic on the importing of pool with WRC"
This reverts commit 45bc50222913cddafde94621d28b78d6efaea897.
NEX-3984 On-demand TRIM
Reviewed by: Alek Pinchuk <alek@nexenta.com>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Conflicts:
        usr/src/common/zfs/zpool_prop.c
        usr/src/uts/common/sys/fs/zfs.h
NEX-3965 System may panic on the importing of pool with WRC
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
NEX-3558 KRRP Integration
NEX-3508 CLONE - Port NEX-2946 Add UNMAP/TRIM functionality to ZFS and illumos
Reviewed by: Josef Sipek <josef.sipek@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Conflicts:
    usr/src/uts/common/io/scsi/targets/sd.c
    usr/src/uts/common/sys/scsi/targets/sddef.h
NEX-3165 need some dedup improvements
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
4391 panic system rather than corrupting pool if we hit bug 4390
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Approved by: Gordon Ross <gwr@nexenta.com>
4370 avoid transmitting holes during zfs send
4371 DMU code clean up
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net>
Approved by: Garrett D'Amore <garrett@damore.org>
OS-114 Heap leak when exporting/destroying pools with CoS
SUP-577 deadlock between zpool detach and syseventd
OS-80 support for vdev and CoS properties for the new I/O scheduler
OS-95 lint warning introduced by OS-61
Fixup merge results
re #13333 rb4362 - eliminated spa_update_iotime() to fix the stats
re #12643 rb4064 ZFS meta refactoring - vdev utilization tracking, auto-dedup
re #12585 rb4049 ZFS++ work port - refactoring to improve separation of open/closed code, bug fixes, performance improvements - open code
re #8346 rb2639 KT disk failures
Bug 11205: add missing libzfs_closed_stubs.c to fix opensource-only build.
ZFS plus work: special vdevs, cos, cos/vdev properties


*** 19,30 ****
   * CDDL HEADER END
   */
  /*
   * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
   * Copyright (c) 2011, 2017 by Delphix. All rights reserved.
-  * Copyright 2015 Nexenta Systems, Inc.  All rights reserved.
   * Copyright (c) 2014 Spectra Logic Corporation, All rights reserved.
   * Copyright 2013 Saso Kiselkov. All rights reserved.
   * Copyright (c) 2014 Integros [integros.com]
   * Copyright (c) 2017 Datto Inc.
   */
  
--- 19,30 ----
   * CDDL HEADER END
   */
  /*
   * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
   * Copyright (c) 2011, 2017 by Delphix. All rights reserved.
   * Copyright (c) 2014 Spectra Logic Corporation, All rights reserved.
+  * Copyright 2019 Nexenta Systems, Inc.  All rights reserved.
   * Copyright 2013 Saso Kiselkov. All rights reserved.
   * Copyright (c) 2014 Integros [integros.com]
   * Copyright (c) 2017 Datto Inc.
   */
  
*** 50,59 ****
--- 50,60 ----
  #include <sys/dsl_scan.h>
  #include <sys/fs/zfs.h>
  #include <sys/metaslab_impl.h>
  #include <sys/arc.h>
  #include <sys/ddt.h>
+ #include <sys/cos.h>
  #include "zfs_prop.h"
  #include <sys/zfeature.h>
  
  /*
   * SPA locking
*** 224,233 ****
--- 225,242 ----
   *
   * spa_rename() is also implemented within this file since it requires
   * manipulation of the namespace.
   */
  
+ struct spa_trimstats {
+         kstat_named_t   st_extents;             /* # of extents issued to zio */
+         kstat_named_t   st_bytes;               /* # of bytes issued to zio */
+         kstat_named_t   st_extents_skipped;     /* # of extents too small */
+         kstat_named_t   st_bytes_skipped;       /* bytes in extents_skipped */
+         kstat_named_t   st_auto_slow;           /* trim slow, exts dropped */
+ };
+ 
  static avl_tree_t spa_namespace_avl;
  kmutex_t spa_namespace_lock;
  static kcondvar_t spa_namespace_cv;
  static int spa_active_count;
  int spa_max_replication_override = SPA_DVAS_PER_BP;
*** 239,257 ****
  
  kmem_cache_t *spa_buffer_pool;
  int spa_mode_global;
  
  #ifdef ZFS_DEBUG
! /*
!  * Everything except dprintf, spa, and indirect_remap is on by default
!  * in debug builds.
!  */
! int zfs_flags = ~(ZFS_DEBUG_DPRINTF | ZFS_DEBUG_SPA | ZFS_DEBUG_INDIRECT_REMAP);
  #else
  int zfs_flags = 0;
  #endif
  
  /*
   * zfs_recover can be set to nonzero to attempt to recover from
   * otherwise-fatal errors, typically caused by on-disk corruption.  When
   * set, calls to zfs_panic_recover() will turn into warning messages.
   * This should only be used as a last resort, as it typically results
--- 248,266 ----
  
  kmem_cache_t *spa_buffer_pool;
  int spa_mode_global;
  
  #ifdef ZFS_DEBUG
! /* Everything except dprintf and spa is on by default in debug builds */
! int zfs_flags = ~(ZFS_DEBUG_DPRINTF | ZFS_DEBUG_SPA);
  #else
  int zfs_flags = 0;
  #endif
  
+ #define ZFS_OBJ_MTX_DEFAULT_SZ  64
+ uint64_t spa_obj_mtx_sz = ZFS_OBJ_MTX_DEFAULT_SZ;
+ 
  /*
   * zfs_recover can be set to nonzero to attempt to recover from
   * otherwise-fatal errors, typically caused by on-disk corruption.  When
   * set, calls to zfs_panic_recover() will turn into warning messages.
   * This should only be used as a last resort, as it typically results
*** 289,306 ****
   * leaking space in the "partial temporary" failure case.
   */
  boolean_t zfs_free_leak_on_eio = B_FALSE;
  
  /*
   * Expiration time in milliseconds. This value has two meanings. First it is
   * used to determine when the spa_deadman() logic should fire. By default the
!  * spa_deadman() will fire if spa_sync() has not completed in 1000 seconds.
   * Secondly, the value determines if an I/O is considered "hung". Any I/O that
   * has not completed in zfs_deadman_synctime_ms is considered "hung" resulting
   * in a system panic.
   */
! uint64_t zfs_deadman_synctime_ms = 1000000ULL;
  
  /*
   * Check time in milliseconds. This defines the frequency at which we check
   * for hung I/O.
   */
--- 298,321 ----
   * leaking space in the "partial temporary" failure case.
   */
  boolean_t zfs_free_leak_on_eio = B_FALSE;
  
  /*
+  * alpha for spa_update_latency() rolling average of pool latency, which
+  * is updated on every txg commit.
+  */
+ int64_t zfs_root_latency_alpha = 10;
+ 
+ /*
   * Expiration time in milliseconds. This value has two meanings. First it is
   * used to determine when the spa_deadman() logic should fire. By default the
!  * spa_deadman() will fire if spa_sync() has not completed in 250 seconds.
   * Secondly, the value determines if an I/O is considered "hung". Any I/O that
   * has not completed in zfs_deadman_synctime_ms is considered "hung" resulting
   * in a system panic.
   */
! uint64_t zfs_deadman_synctime_ms = 250000ULL;
  
  /*
   * Check time in milliseconds. This defines the frequency at which we check
   * for hung I/O.
   */
*** 352,391 ****
   * See also the comments in zfs_space_check_t.
   */
  int spa_slop_shift = 5;
  uint64_t spa_min_slop = 128 * 1024 * 1024;
  
! /*PRINTFLIKE2*/
! void
! spa_load_failed(spa_t *spa, const char *fmt, ...)
! {
!         va_list adx;
!         char buf[256];
  
-         va_start(adx, fmt);
-         (void) vsnprintf(buf, sizeof (buf), fmt, adx);
-         va_end(adx);
- 
-         zfs_dbgmsg("spa_load(%s, config %s): FAILED: %s", spa->spa_name,
-             spa->spa_trust_config ? "trusted" : "untrusted", buf);
- }
- 
- /*PRINTFLIKE2*/
- void
- spa_load_note(spa_t *spa, const char *fmt, ...)
- {
-         va_list adx;
-         char buf[256];
- 
-         va_start(adx, fmt);
-         (void) vsnprintf(buf, sizeof (buf), fmt, adx);
-         va_end(adx);
- 
-         zfs_dbgmsg("spa_load(%s, config %s): %s", spa->spa_name,
-             spa->spa_trust_config ? "trusted" : "untrusted", buf);
- }
- 
  /*
   * ==========================================================================
   * SPA config locking
   * ==========================================================================
   */
--- 367,379 ----
   * See also the comments in zfs_space_check_t.
   */
  int spa_slop_shift = 5;
  uint64_t spa_min_slop = 128 * 1024 * 1024;
  
! static void spa_trimstats_create(spa_t *spa);
! static void spa_trimstats_destroy(spa_t *spa);
  
  /*
   * ==========================================================================
   * SPA config locking
   * ==========================================================================
   */
*** 474,484 ****
                          scl->scl_writer = curthread;
                  }
                  (void) refcount_add(&scl->scl_count, tag);
                  mutex_exit(&scl->scl_lock);
          }
!         ASSERT3U(wlocks_held, <=, locks);
  }
  
  void
  spa_config_exit(spa_t *spa, int locks, void *tag)
  {
--- 462,472 ----
                          scl->scl_writer = curthread;
                  }
                  (void) refcount_add(&scl->scl_count, tag);
                  mutex_exit(&scl->scl_lock);
          }
!         ASSERT(wlocks_held <= locks);
  }
  
  void
  spa_config_exit(spa_t *spa, int locks, void *tag)
  {
*** 585,594 ****
--- 573,583 ----
  {
          spa_t *spa;
          spa_config_dirent_t *dp;
          cyc_handler_t hdlr;
          cyc_time_t when;
+         uint64_t guid;
  
          ASSERT(MUTEX_HELD(&spa_namespace_lock));
  
          spa = kmem_zalloc(sizeof (spa_t), KM_SLEEP);
  
*** 602,618 ****
          mutex_init(&spa->spa_cksum_tmpls_lock, NULL, MUTEX_DEFAULT, NULL);
          mutex_init(&spa->spa_scrub_lock, NULL, MUTEX_DEFAULT, NULL);
          mutex_init(&spa->spa_suspend_lock, NULL, MUTEX_DEFAULT, NULL);
          mutex_init(&spa->spa_vdev_top_lock, NULL, MUTEX_DEFAULT, NULL);
          mutex_init(&spa->spa_iokstat_lock, NULL, MUTEX_DEFAULT, NULL);
!         mutex_init(&spa->spa_alloc_lock, NULL, MUTEX_DEFAULT, NULL);
  
          cv_init(&spa->spa_async_cv, NULL, CV_DEFAULT, NULL);
          cv_init(&spa->spa_evicting_os_cv, NULL, CV_DEFAULT, NULL);
          cv_init(&spa->spa_proc_cv, NULL, CV_DEFAULT, NULL);
          cv_init(&spa->spa_scrub_io_cv, NULL, CV_DEFAULT, NULL);
          cv_init(&spa->spa_suspend_cv, NULL, CV_DEFAULT, NULL);
  
          for (int t = 0; t < TXG_SIZE; t++)
                  bplist_create(&spa->spa_free_bplist[t]);
  
          (void) strlcpy(spa->spa_name, name, sizeof (spa->spa_name));
--- 591,615 ----
          mutex_init(&spa->spa_cksum_tmpls_lock, NULL, MUTEX_DEFAULT, NULL);
          mutex_init(&spa->spa_scrub_lock, NULL, MUTEX_DEFAULT, NULL);
          mutex_init(&spa->spa_suspend_lock, NULL, MUTEX_DEFAULT, NULL);
          mutex_init(&spa->spa_vdev_top_lock, NULL, MUTEX_DEFAULT, NULL);
          mutex_init(&spa->spa_iokstat_lock, NULL, MUTEX_DEFAULT, NULL);
!         mutex_init(&spa->spa_cos_props_lock, NULL, MUTEX_DEFAULT, NULL);
!         mutex_init(&spa->spa_vdev_props_lock, NULL, MUTEX_DEFAULT, NULL);
!         mutex_init(&spa->spa_perfmon.perfmon_lock, NULL, MUTEX_DEFAULT, NULL);
  
+         mutex_init(&spa->spa_auto_trim_lock, NULL, MUTEX_DEFAULT, NULL);
+         mutex_init(&spa->spa_man_trim_lock, NULL, MUTEX_DEFAULT, NULL);
+ 
          cv_init(&spa->spa_async_cv, NULL, CV_DEFAULT, NULL);
          cv_init(&spa->spa_evicting_os_cv, NULL, CV_DEFAULT, NULL);
          cv_init(&spa->spa_proc_cv, NULL, CV_DEFAULT, NULL);
          cv_init(&spa->spa_scrub_io_cv, NULL, CV_DEFAULT, NULL);
          cv_init(&spa->spa_suspend_cv, NULL, CV_DEFAULT, NULL);
+         cv_init(&spa->spa_auto_trim_done_cv, NULL, CV_DEFAULT, NULL);
+         cv_init(&spa->spa_man_trim_update_cv, NULL, CV_DEFAULT, NULL);
+         cv_init(&spa->spa_man_trim_done_cv, NULL, CV_DEFAULT, NULL);
  
          for (int t = 0; t < TXG_SIZE; t++)
                  bplist_create(&spa->spa_free_bplist[t]);
  
          (void) strlcpy(spa->spa_name, name, sizeof (spa->spa_name));
*** 620,631 ****
          spa->spa_freeze_txg = UINT64_MAX;
          spa->spa_final_txg = UINT64_MAX;
          spa->spa_load_max_txg = UINT64_MAX;
          spa->spa_proc = &p0;
          spa->spa_proc_state = SPA_PROC_NONE;
!         spa->spa_trust_config = B_TRUE;
  
          hdlr.cyh_func = spa_deadman;
          hdlr.cyh_arg = spa;
          hdlr.cyh_level = CY_LOW_LEVEL;
  
          spa->spa_deadman_synctime = MSEC2NSEC(zfs_deadman_synctime_ms);
--- 617,643 ----
          spa->spa_freeze_txg = UINT64_MAX;
          spa->spa_final_txg = UINT64_MAX;
          spa->spa_load_max_txg = UINT64_MAX;
          spa->spa_proc = &p0;
          spa->spa_proc_state = SPA_PROC_NONE;
!         if (spa_obj_mtx_sz < 1 || spa_obj_mtx_sz > INT_MAX)
!                 spa->spa_obj_mtx_sz = ZFS_OBJ_MTX_DEFAULT_SZ;
!         else
!                 spa->spa_obj_mtx_sz = spa_obj_mtx_sz;
  
+         /*
+          * Grabbing the guid here is just so that spa_config_guid_exists can
+          * check early on to protect against doubled imports of the same pool
+          * under different names. If the GUID isn't provided here, we will
+          * let spa generate one later on during spa_load, although in that
+          * case we might not be able to provide the double-import protection.
+          */
+         if (nvlist_lookup_uint64(config, ZPOOL_CONFIG_POOL_GUID, &guid) == 0) {
+                 spa->spa_config_guid = guid;
+                 ASSERT(!spa_config_guid_exists(guid));
+         }
+ 
          hdlr.cyh_func = spa_deadman;
          hdlr.cyh_arg = spa;
          hdlr.cyh_level = CY_LOW_LEVEL;
  
          spa->spa_deadman_synctime = MSEC2NSEC(zfs_deadman_synctime_ms);
*** 653,665 ****
          if (altroot) {
                  spa->spa_root = spa_strdup(altroot);
                  spa_active_count++;
          }
  
-         avl_create(&spa->spa_alloc_tree, zio_bookmark_compare,
-             sizeof (zio_t), offsetof(zio_t, io_alloc_node));
- 
          /*
           * Every pool starts with the default cachefile
           */
          list_create(&spa->spa_config_list, sizeof (spa_config_dirent_t),
              offsetof(spa_config_dirent_t, scd_link));
--- 665,674 ----
*** 687,706 ****
                  VERIFY(nvlist_alloc(&spa->spa_label_features, NV_UNIQUE_NAME,
                      KM_SLEEP) == 0);
          }
  
          spa->spa_iokstat = kstat_create("zfs", 0, name,
!             "disk", KSTAT_TYPE_IO, 1, 0);
          if (spa->spa_iokstat) {
                  spa->spa_iokstat->ks_lock = &spa->spa_iokstat_lock;
                  kstat_install(spa->spa_iokstat);
          }
  
          spa->spa_debug = ((zfs_flags & ZFS_DEBUG_SPA) != 0);
  
          spa->spa_min_ashift = INT_MAX;
          spa->spa_max_ashift = 0;
  
          /*
           * As a pool is being created, treat all features as disabled by
           * setting SPA_FEATURE_DISABLED for all entries in the feature
           * refcount cache.
--- 696,724 ----
                  VERIFY(nvlist_alloc(&spa->spa_label_features, NV_UNIQUE_NAME,
                      KM_SLEEP) == 0);
          }
  
          spa->spa_iokstat = kstat_create("zfs", 0, name,
!             "zfs", KSTAT_TYPE_IO, 1, 0);
          if (spa->spa_iokstat) {
                  spa->spa_iokstat->ks_lock = &spa->spa_iokstat_lock;
                  kstat_install(spa->spa_iokstat);
          }
  
+         spa_trimstats_create(spa);
+ 
          spa->spa_debug = ((zfs_flags & ZFS_DEBUG_SPA) != 0);
  
+         autosnap_init(spa);
+ 
+         spa_cos_init(spa);
+ 
+         spa_special_init(spa);
+ 
          spa->spa_min_ashift = INT_MAX;
          spa->spa_max_ashift = 0;
+         wbc_init(&spa->spa_wbc, spa);
  
          /*
           * As a pool is being created, treat all features as disabled by
           * setting SPA_FEATURE_DISABLED for all entries in the feature
           * refcount cache.
*** 741,753 ****
                  if (dp->scd_path != NULL)
                          spa_strfree(dp->scd_path);
                  kmem_free(dp, sizeof (spa_config_dirent_t));
          }
  
-         avl_destroy(&spa->spa_alloc_tree);
          list_destroy(&spa->spa_config_list);
  
          nvlist_free(spa->spa_label_features);
          nvlist_free(spa->spa_load_info);
          spa_config_set(spa, NULL);
  
          mutex_enter(&cpu_lock);
--- 759,778 ----
                  if (dp->scd_path != NULL)
                          spa_strfree(dp->scd_path);
                  kmem_free(dp, sizeof (spa_config_dirent_t));
          }
  
          list_destroy(&spa->spa_config_list);
  
+         wbc_fini(&spa->spa_wbc);
+ 
+         spa_special_fini(spa);
+ 
+         spa_cos_fini(spa);
+ 
+         autosnap_fini(spa);
+ 
          nvlist_free(spa->spa_label_features);
          nvlist_free(spa->spa_load_info);
          spa_config_set(spa, NULL);
  
          mutex_enter(&cpu_lock);
*** 758,767 ****
--- 783,794 ----
  
          refcount_destroy(&spa->spa_refcount);
  
          spa_config_lock_destroy(spa);
  
+         spa_trimstats_destroy(spa);
+ 
          kstat_delete(spa->spa_iokstat);
          spa->spa_iokstat = NULL;
  
          for (int t = 0; t < TXG_SIZE; t++)
                  bplist_destroy(&spa->spa_free_bplist[t]);
*** 771,782 ****
          cv_destroy(&spa->spa_async_cv);
          cv_destroy(&spa->spa_evicting_os_cv);
          cv_destroy(&spa->spa_proc_cv);
          cv_destroy(&spa->spa_scrub_io_cv);
          cv_destroy(&spa->spa_suspend_cv);
  
-         mutex_destroy(&spa->spa_alloc_lock);
          mutex_destroy(&spa->spa_async_lock);
          mutex_destroy(&spa->spa_errlist_lock);
          mutex_destroy(&spa->spa_errlog_lock);
          mutex_destroy(&spa->spa_evicting_os_lock);
          mutex_destroy(&spa->spa_history_lock);
--- 798,811 ----
          cv_destroy(&spa->spa_async_cv);
          cv_destroy(&spa->spa_evicting_os_cv);
          cv_destroy(&spa->spa_proc_cv);
          cv_destroy(&spa->spa_scrub_io_cv);
          cv_destroy(&spa->spa_suspend_cv);
+         cv_destroy(&spa->spa_auto_trim_done_cv);
+         cv_destroy(&spa->spa_man_trim_update_cv);
+         cv_destroy(&spa->spa_man_trim_done_cv);
  
          mutex_destroy(&spa->spa_async_lock);
          mutex_destroy(&spa->spa_errlist_lock);
          mutex_destroy(&spa->spa_errlog_lock);
          mutex_destroy(&spa->spa_evicting_os_lock);
          mutex_destroy(&spa->spa_history_lock);
*** 785,794 ****
--- 814,827 ----
          mutex_destroy(&spa->spa_cksum_tmpls_lock);
          mutex_destroy(&spa->spa_scrub_lock);
          mutex_destroy(&spa->spa_suspend_lock);
          mutex_destroy(&spa->spa_vdev_top_lock);
          mutex_destroy(&spa->spa_iokstat_lock);
+         mutex_destroy(&spa->spa_cos_props_lock);
+         mutex_destroy(&spa->spa_vdev_props_lock);
+         mutex_destroy(&spa->spa_auto_trim_lock);
+         mutex_destroy(&spa->spa_man_trim_lock);
  
          kmem_free(spa, sizeof (spa_t));
  }
  
  /*
*** 1108,1117 ****
--- 1141,1153 ----
  uint64_t
  spa_vdev_enter(spa_t *spa)
  {
          mutex_enter(&spa->spa_vdev_top_lock);
          mutex_enter(&spa_namespace_lock);
+         mutex_enter(&spa->spa_auto_trim_lock);
+         mutex_enter(&spa->spa_man_trim_lock);
+         spa_trim_stop_wait(spa);
          return (spa_vdev_config_enter(spa));
  }
  
  /*
   * Internal implementation for spa_vdev_enter().  Used when a vdev
*** 1156,1165 ****
--- 1192,1202 ----
          /*
           * Verify the metaslab classes.
           */
          ASSERT(metaslab_class_validate(spa_normal_class(spa)) == 0);
          ASSERT(metaslab_class_validate(spa_log_class(spa)) == 0);
+         ASSERT(metaslab_class_validate(spa_special_class(spa)) == 0);
  
          spa_config_exit(spa, SCL_ALL, spa);
  
          /*
           * Panic the system if the specified tag requires it.  This
*** 1186,1196 ****
  
          /*
           * If the config changed, update the config cache.
           */
          if (config_changed)
!                 spa_write_cachefile(spa, B_FALSE, B_TRUE);
  }
  
  /*
   * Unlock the spa_t after adding or removing a vdev.  Besides undoing the
   * locking of spa_vdev_enter(), we also want make sure the transactions have
--- 1223,1233 ----
  
          /*
           * If the config changed, update the config cache.
           */
          if (config_changed)
!                 spa_config_sync(spa, B_FALSE, B_TRUE);
  }
  
  /*
   * Unlock the spa_t after adding or removing a vdev.  Besides undoing the
   * locking of spa_vdev_enter(), we also want make sure the transactions have
*** 1199,1208 ****
--- 1236,1247 ----
   */
  int
  spa_vdev_exit(spa_t *spa, vdev_t *vd, uint64_t txg, int error)
  {
          spa_vdev_config_exit(spa, vd, txg, error, FTAG);
+         mutex_exit(&spa->spa_man_trim_lock);
+         mutex_exit(&spa->spa_auto_trim_lock);
          mutex_exit(&spa_namespace_lock);
          mutex_exit(&spa->spa_vdev_top_lock);
  
          return (error);
  }
*** 1270,1280 ****
          /*
           * If the config changed, update the config cache.
           */
          if (config_changed) {
                  mutex_enter(&spa_namespace_lock);
!                 spa_write_cachefile(spa, B_FALSE, B_TRUE);
                  mutex_exit(&spa_namespace_lock);
          }
  
          return (error);
  }
--- 1309,1319 ----
          /*
           * If the config changed, update the config cache.
           */
          if (config_changed) {
                  mutex_enter(&spa_namespace_lock);
!                 spa_config_sync(spa, B_FALSE, B_TRUE);
                  mutex_exit(&spa_namespace_lock);
          }
  
          return (error);
  }
*** 1348,1358 ****
          txg_wait_synced(spa->spa_dsl_pool, 0);
  
          /*
           * Sync the updated config cache.
           */
!         spa_write_cachefile(spa, B_FALSE, B_TRUE);
  
          spa_close(spa, FTAG);
  
          mutex_exit(&spa_namespace_lock);
  
--- 1387,1397 ----
          txg_wait_synced(spa->spa_dsl_pool, 0);
  
          /*
           * Sync the updated config cache.
           */
!         spa_config_sync(spa, B_FALSE, B_TRUE);
  
          spa_close(spa, FTAG);
  
          mutex_exit(&spa_namespace_lock);
  
*** 1406,1415 ****
--- 1445,1483 ----
  spa_guid_exists(uint64_t pool_guid, uint64_t device_guid)
  {
          return (spa_by_guid(pool_guid, device_guid) != NULL);
  }
  
+ /*
+  * Similar to spa_guid_exists, but uses the spa_config_guid and doesn't
+  * filter the check by pool state (as spa_guid_exists does). This is
+  * used to protect against attempting to spa_add the same pool (with the
+  * same pool GUID) under different names. This situation can happen if
+  * the boot_archive contains an outdated zpool.cache file after a pool
+  * rename. That would make us import the pool twice, resulting in data
+  * corruption. Normally the boot_archive shouldn't contain a zpool.cache
+  * file, but if due to misconfiguration it does, this function serves as
+  * a failsafe to prevent the double import.
+  */
+ boolean_t
+ spa_config_guid_exists(uint64_t pool_guid)
+ {
+         spa_t *spa;
+ 
+         ASSERT(MUTEX_HELD(&spa_namespace_lock));
+         if (pool_guid == 0)
+                 return (B_FALSE);
+ 
+         for (spa = avl_first(&spa_namespace_avl); spa != NULL;
+             spa = AVL_NEXT(&spa_namespace_avl, spa)) {
+                 if (spa->spa_config_guid == pool_guid)
+                         return (B_TRUE);
+         }
+ 
+         return (B_FALSE);
+ }
+ 
  char *
  spa_strdup(const char *s)
  {
          size_t len;
          char *new;
*** 1564,1579 ****
  spa_is_initializing(spa_t *spa)
  {
          return (spa->spa_is_initializing);
  }
  
- boolean_t
- spa_indirect_vdevs_loaded(spa_t *spa)
- {
-         return (spa->spa_indirect_vdevs_loaded);
- }
- 
  blkptr_t *
  spa_get_rootblkptr(spa_t *spa)
  {
          return (&spa->spa_ubsync.ub_rootbp);
  }
--- 1632,1641 ----
*** 1696,1705 ****
--- 1758,1811 ----
  {
          return (lsize * spa_asize_inflation);
  }
  
  /*
+  * Get either on disk (phys == B_TRUE) or possible in core DDT size
+  */
+ uint64_t
+ spa_get_ddts_size(spa_t *spa, boolean_t phys)
+ {
+         if (phys)
+                 return (spa->spa_ddt_dsize);
+ 
+         return (spa->spa_ddt_msize);
+ }
+ 
+ /*
+  * Check to see if we need to stop DDT growth to stay within some limit
+  */
+ boolean_t
+ spa_enable_dedup_cap(spa_t *spa)
+ {
+         if (zfs_ddt_byte_ceiling != 0) {
+                 if (zfs_ddts_msize > zfs_ddt_byte_ceiling) {
+                         /* need to limit DDT to an in core bytecount */
+                         return (B_TRUE);
+                 }
+         } else if (zfs_ddt_limit_type == DDT_LIMIT_TO_ARC) {
+                 if (zfs_ddts_msize > *arc_ddt_evict_threshold) {
+                         /* need to limit DDT to fit into ARC */
+                         return (B_TRUE);
+                 }
+         } else if (zfs_ddt_limit_type == DDT_LIMIT_TO_L2ARC) {
+                 if (spa->spa_l2arc_ddt_devs_size != 0) {
+                         if (spa_get_ddts_size(spa, B_TRUE) >
+                             spa->spa_l2arc_ddt_devs_size) {
+                                 /* limit DDT to fit into L2ARC DDT dev */
+                                 return (B_TRUE);
+                         }
+                 } else if (zfs_ddts_msize > *arc_ddt_evict_threshold) {
+                         /* no L2ARC DDT dev - limit DDT to fit into ARC */
+                         return (B_TRUE);
+                 }
+         }
+ 
+         return (B_FALSE);
+ }
+ 
+ /*
   * Return the amount of slop space in bytes.  It is 1/32 of the pool (3.2%),
   * or at least 128MB, unless that would cause it to be more than half the
   * pool size.
   *
   * See the comment above spa_slop_shift for details.
*** 1720,1749 ****
  void
  spa_update_dspace(spa_t *spa)
  {
          spa->spa_dspace = metaslab_class_get_dspace(spa_normal_class(spa)) +
              ddt_get_dedup_dspace(spa);
!         if (spa->spa_vdev_removal != NULL) {
                  /*
!                  * We can't allocate from the removing device, so
!                  * subtract its size.  This prevents the DMU/DSL from
!                  * filling up the (now smaller) pool while we are in the
!                  * middle of removing the device.
!                  *
!                  * Note that the DMU/DSL doesn't actually know or care
!                  * how much space is allocated (it does its own tracking
!                  * of how much space has been logically used).  So it
!                  * doesn't matter that the data we are moving may be
!                  * allocated twice (on the old device and the new
!                  * device).
                   */
!                 vdev_t *vd = spa->spa_vdev_removal->svr_vdev;
!                 spa->spa_dspace -= spa_deflate(spa) ?
!                     vd->vdev_stat.vs_dspace : vd->vdev_stat.vs_space;
          }
  }
  
  /*
   * Return the failure mode that has been set to this pool. The default
   * behavior will be to block all I/Os when a complete failure occurs.
   */
  uint8_t
--- 1826,1880 ----
  void
  spa_update_dspace(spa_t *spa)
  {
          spa->spa_dspace = metaslab_class_get_dspace(spa_normal_class(spa)) +
              ddt_get_dedup_dspace(spa);
! }
! 
! /*
!  * EXPERIMENTAL
!  * Use exponential moving average to track root vdev iotime, as well as top
!  * level vdev iotime.
!  * The principle: avg_new = avg_prev + (cur - avg_prev) * a / 100; a is
!  * tuneable. For example, if a = 10 (alpha = 0.1), it will take 20 iterations,
!  * or 100 seconds at 5 second txg commit intervals for the values from last 20
!  * iterations to account for 66% of the moving average.
!  * Currently, the challenge is that we keep track of iotime in cumulative
!  * nanoseconds since zpool import, both for leaf and top vdevs, so a way of
!  * getting delta pre/post txg commit is required.
!  */
! 
! void
! spa_update_latency(spa_t *spa)
! {
!         vdev_t *rvd = spa->spa_root_vdev;
!         vdev_stat_t *rvs = &rvd->vdev_stat;
!         for (int c = 0; c < rvd->vdev_children; c++) {
!                 vdev_t *cvd = rvd->vdev_child[c];
!                 vdev_stat_t *cvs = &cvd->vdev_stat;
!                 mutex_enter(&rvd->vdev_stat_lock);
! 
!                 for (int t = 0; t < ZIO_TYPES; t++) {
! 
                          /*
!                          * Non-trivial bit here. We update the moving latency
!                          * average for each child vdev separately, but since we
!                          * want the average to settle at the same rate
!                          * regardless of top level vdev count, we effectively
!                          * divide our alpha by number of children of the root
!                          * vdev to account for that.
                           */
!                         rvs->vs_latency[t] += ((((int64_t)cvs->vs_latency[t] -
!                             (int64_t)rvs->vs_latency[t]) *
!                             (int64_t)zfs_root_latency_alpha) / 100) /
!                             (int64_t)(rvd->vdev_children);
                  }
+                 mutex_exit(&rvd->vdev_stat_lock);
+         }
  }
  
+ 
  /*
   * Return the failure mode that has been set to this pool. The default
   * behavior will be to block all I/Os when a complete failure occurs.
   */
  uint8_t
*** 1762,1771 ****
--- 1893,1908 ----
  spa_version(spa_t *spa)
  {
          return (spa->spa_ubsync.ub_version);
  }
  
+ int
+ spa_get_obj_mtx_sz(spa_t *spa)
+ {
+         return (spa->spa_obj_mtx_sz);
+ }
+ 
  boolean_t
  spa_deflate(spa_t *spa)
  {
          return (spa->spa_deflate);
  }
*** 1780,1789 ****
--- 1917,1932 ----
  spa_log_class(spa_t *spa)
  {
          return (spa->spa_log_class);
  }
  
+ metaslab_class_t *
+ spa_special_class(spa_t *spa)
+ {
+         return (spa->spa_special_class);
+ }
+ 
  void
  spa_evicting_os_register(spa_t *spa, objset_t *os)
  {
          mutex_enter(&spa->spa_evicting_os_lock);
          list_insert_head(&spa->spa_evicting_os_list, os);
*** 1808,1817 ****
--- 1951,1970 ----
          mutex_exit(&spa->spa_evicting_os_lock);
  
          dmu_buf_user_evict_wait();
  }
  
+ uint64_t
+ spa_class_alloc_percentage(metaslab_class_t *mc)
+ {
+         uint64_t capacity = mc->mc_space;
+         uint64_t alloc = mc->mc_alloc;
+         uint64_t one_percent = capacity / 100;
+ 
+         return (alloc / one_percent);
+ }
+ 
  int
  spa_max_replication(spa_t *spa)
  {
          /*
           * As of SPA_VERSION == SPA_VERSION_DITTO_BLOCKS, we are able to
*** 1833,1842 ****
--- 1986,2007 ----
  spa_deadman_synctime(spa_t *spa)
  {
          return (spa->spa_deadman_synctime);
  }
  
+ spa_force_trim_t
+ spa_get_force_trim(spa_t *spa)
+ {
+         return (spa->spa_force_trim);
+ }
+ 
+ spa_auto_trim_t
+ spa_get_auto_trim(spa_t *spa)
+ {
+         return (spa->spa_auto_trim);
+ }
+ 
  uint64_t
  dva_get_dsize_sync(spa_t *spa, const dva_t *dva)
  {
          uint64_t asize = DVA_GET_ASIZE(dva);
          uint64_t dsize = asize;
*** 1849,1878 ****
          }
  
          return (dsize);
  }
  
  uint64_t
  bp_get_dsize_sync(spa_t *spa, const blkptr_t *bp)
  {
          uint64_t dsize = 0;
  
!         for (int d = 0; d < BP_GET_NDVAS(bp); d++)
                  dsize += dva_get_dsize_sync(spa, &bp->blk_dva[d]);
  
          return (dsize);
  }
  
  uint64_t
  bp_get_dsize(spa_t *spa, const blkptr_t *bp)
  {
!         uint64_t dsize = 0;
  
          spa_config_enter(spa, SCL_VDEV, FTAG, RW_READER);
  
!         for (int d = 0; d < BP_GET_NDVAS(bp); d++)
!                 dsize += dva_get_dsize_sync(spa, &bp->blk_dva[d]);
  
          spa_config_exit(spa, SCL_VDEV, FTAG);
  
          return (dsize);
  }
--- 2014,2054 ----
          }
  
          return (dsize);
  }
  
+ /*
+  * This function walks over the all DVAs of the given BP and
+  * adds up their sizes.
+  */
  uint64_t
  bp_get_dsize_sync(spa_t *spa, const blkptr_t *bp)
  {
+         /*
+          * SPECIAL-BP has two DVAs, but DVA[0] in this case is a
+          * temporary DVA, and after migration only the DVA[1]
+          * contains valid data. Therefore, we start walking for
+          * these BPs from DVA[1].
+          */
+         int start_dva = BP_IS_SPECIAL(bp) ? 1 : 0;
          uint64_t dsize = 0;
  
!         for (int d = start_dva; d < BP_GET_NDVAS(bp); d++) {
                  dsize += dva_get_dsize_sync(spa, &bp->blk_dva[d]);
+         }
  
          return (dsize);
  }
  
  uint64_t
  bp_get_dsize(spa_t *spa, const blkptr_t *bp)
  {
!         uint64_t dsize;
  
          spa_config_enter(spa, SCL_VDEV, FTAG, RW_READER);
  
!         dsize = bp_get_dsize_sync(spa, bp);
  
          spa_config_exit(spa, SCL_VDEV, FTAG);
  
          return (dsize);
  }
*** 1927,1936 ****
--- 2103,2120 ----
          avl_create(&spa_l2cache_avl, spa_l2cache_compare, sizeof (spa_aux_t),
              offsetof(spa_aux_t, aux_avl));
  
          spa_mode_global = mode;
  
+         /*
+          * logevent_max_q_sz from log_sysevent.c gives us upper bound on
+          * the number of taskq entries; queueing of sysevents is serialized,
+          * so there is no need for more than one worker thread
+          */
+         spa_sysevent_taskq = taskq_create("spa_sysevent_tq", 1,
+             minclsyspri, 1, 5000, TASKQ_DYNAMIC);
+ 
  #ifdef _KERNEL
          spa_arch_init();
  #else
          if (spa_mode_global != FREAD && dprintf_find_string("watch")) {
                  arc_procfd = open("/proc/self/ctl", O_WRONLY);
*** 1952,1968 ****
--- 2136,2158 ----
          zil_init();
          vdev_cache_stat_init();
          zfs_prop_init();
          zpool_prop_init();
          zpool_feature_init();
+         vdev_prop_init();
+         cos_prop_init();
          spa_config_load();
          l2arc_start();
+         ddt_init();
+         dsl_scan_global_init();
  }
  
  void
  spa_fini(void)
  {
+         ddt_fini();
+ 
          l2arc_stop();
  
          spa_evict_all();
  
          vdev_cache_stat_fini();
*** 1972,1981 ****
--- 2162,2173 ----
          metaslab_alloc_trace_fini();
          range_tree_fini();
          unique_fini();
          refcount_fini();
  
+         taskq_destroy(spa_sysevent_taskq);
+ 
          avl_destroy(&spa_namespace_avl);
          avl_destroy(&spa_spare_avl);
          avl_destroy(&spa_l2cache_avl);
  
          cv_destroy(&spa_namespace_cv);
*** 2014,2024 ****
  }
  
  boolean_t
  spa_writeable(spa_t *spa)
  {
!         return (!!(spa->spa_mode & FWRITE) && spa->spa_trust_config);
  }
  
  /*
   * Returns true if there is a pending sync task in any of the current
   * syncing txg, the current quiescing txg, or the current open txg.
--- 2206,2216 ----
  }
  
  boolean_t
  spa_writeable(spa_t *spa)
  {
!         return (!!(spa->spa_mode & FWRITE));
  }
  
  /*
   * Returns true if there is a pending sync task in any of the current
   * syncing txg, the current quiescing txg, or the current open txg.
*** 2027,2036 ****
--- 2219,2234 ----
  spa_has_pending_synctask(spa_t *spa)
  {
          return (!txg_all_lists_empty(&spa->spa_dsl_pool->dp_sync_tasks));
  }
  
+ boolean_t
+ spa_has_special(spa_t *spa)
+ {
+         return (spa->spa_special_class->mc_rotor != NULL);
+ }
+ 
  int
  spa_mode(spa_t *spa)
  {
          return (spa->spa_mode);
  }
*** 2071,2080 ****
--- 2269,2279 ----
                  spa->spa_scan_pass_scrub_pause = spa->spa_scan_pass_start;
          else
                  spa->spa_scan_pass_scrub_pause = 0;
          spa->spa_scan_pass_scrub_spent_paused = 0;
          spa->spa_scan_pass_exam = 0;
+         spa->spa_scan_pass_work = 0;
          vdev_scan_stat_init(spa->spa_root_vdev);
  }
  
  /*
   * Get scan stats for zpool status reports
*** 2096,2109 ****
--- 2295,2312 ----
          ps->pss_examined = scn->scn_phys.scn_examined;
          ps->pss_to_process = scn->scn_phys.scn_to_process;
          ps->pss_processed = scn->scn_phys.scn_processed;
          ps->pss_errors = scn->scn_phys.scn_errors;
          ps->pss_state = scn->scn_phys.scn_state;
+         mutex_enter(&scn->scn_status_lock);
+         ps->pss_issued = scn->scn_bytes_issued;
+         mutex_exit(&scn->scn_status_lock);
  
          /* data not stored on disk */
          ps->pss_pass_start = spa->spa_scan_pass_start;
          ps->pss_pass_exam = spa->spa_scan_pass_exam;
+         ps->pss_pass_work = spa->spa_scan_pass_work;
          ps->pss_pass_scrub_pause = spa->spa_scan_pass_scrub_pause;
          ps->pss_pass_scrub_spent_paused = spa->spa_scan_pass_scrub_spent_paused;
  
          return (0);
  }
*** 2121,2184 ****
                  return (SPA_MAXBLOCKSIZE);
          else
                  return (SPA_OLD_MAXBLOCKSIZE);
  }
  
  /*
!  * Returns the txg that the last device removal completed. No indirect mappings
!  * have been added since this txg.
   */
! uint64_t
! spa_get_last_removal_txg(spa_t *spa)
  {
!         uint64_t vdevid;
!         uint64_t ret = -1ULL;
  
!         spa_config_enter(spa, SCL_VDEV, FTAG, RW_READER);
!         /*
!          * sr_prev_indirect_vdev is only modified while holding all the
!          * config locks, so it is sufficient to hold SCL_VDEV as reader when
!          * examining it.
!          */
!         vdevid = spa->spa_removing_phys.sr_prev_indirect_vdev;
  
!         while (vdevid != -1ULL) {
!                 vdev_t *vd = vdev_lookup_top(spa, vdevid);
!                 vdev_indirect_births_t *vib = vd->vdev_indirect_births;
  
!                 ASSERT3P(vd->vdev_ops, ==, &vdev_indirect_ops);
  
!                 /*
!                  * If the removal did not remap any data, we don't care.
                   */
!                 if (vdev_indirect_births_count(vib) != 0) {
!                         ret = vdev_indirect_births_last_entry_txg(vib);
!                         break;
                  }
  
!                 vdevid = vd->vdev_indirect_config.vic_prev_indirect_vdev;
          }
!         spa_config_exit(spa, SCL_VDEV, FTAG);
  
!         IMPLY(ret != -1ULL,
!             spa_feature_is_active(spa, SPA_FEATURE_DEVICE_REMOVAL));
  
!         return (ret);
  }
  
! boolean_t
! spa_trust_config(spa_t *spa)
  {
!         return (spa->spa_trust_config);
  }
  
! uint64_t
! spa_missing_tvds_allowed(spa_t *spa)
  {
!         return (spa->spa_missing_tvds_allowed);
  }
  
  void
! spa_set_missing_tvds(spa_t *spa, uint64_t missing)
  {
!         spa->spa_missing_tvds = missing;
  }
--- 2324,2530 ----
                  return (SPA_MAXBLOCKSIZE);
          else
                  return (SPA_OLD_MAXBLOCKSIZE);
  }
  
+ boolean_t
+ spa_wbc_present(spa_t *spa)
+ {
+         return (spa->spa_wbc_mode != WBC_MODE_OFF);
+ }
+ 
+ boolean_t
+ spa_wbc_active(spa_t *spa)
+ {
+         return (spa->spa_wbc_mode == WBC_MODE_ACTIVE);
+ }
+ 
+ int
+ spa_wbc_mode(const char *name)
+ {
+         int ret = 0;
+         spa_t *spa;
+ 
+         mutex_enter(&spa_namespace_lock);
+         spa = spa_lookup(name);
+         if (!spa) {
+                 mutex_exit(&spa_namespace_lock);
+                 return (-1);
+         }
+ 
+         ret = (int)spa->spa_wbc_mode;
+         mutex_exit(&spa_namespace_lock);
+         return (ret);
+ }
+ 
+ struct zfs_autosnap *
+ spa_get_autosnap(spa_t *spa)
+ {
+         return (&spa->spa_autosnap);
+ }
+ 
+ wbc_data_t *
+ spa_get_wbc_data(spa_t *spa)
+ {
+         return (&spa->spa_wbc);
+ }
+ 
  /*
!  * Creates the trim kstats structure for a spa.
   */
! static void
! spa_trimstats_create(spa_t *spa)
  {
!         /* truncate pool name to accomodate "_trimstats" suffix */
!         char short_spa_name[KSTAT_STRLEN - 10];
!         char name[KSTAT_STRLEN];
  
!         ASSERT3P(spa->spa_trimstats, ==, NULL);
!         ASSERT3P(spa->spa_trimstats_ks, ==, NULL);
  
!         (void) snprintf(short_spa_name, sizeof (short_spa_name), "%s",
!             spa->spa_name);
!         (void) snprintf(name, sizeof (name), "%s_trimstats", short_spa_name);
  
!         spa->spa_trimstats_ks = kstat_create("zfs", 0, name, "misc",
!             KSTAT_TYPE_NAMED, sizeof (*spa->spa_trimstats) /
!             sizeof (kstat_named_t), 0);
!         if (spa->spa_trimstats_ks) {
!                 spa->spa_trimstats = spa->spa_trimstats_ks->ks_data;
  
! #ifdef _KERNEL
!                 kstat_named_init(&spa->spa_trimstats->st_extents,
!                     "extents", KSTAT_DATA_UINT64);
!                 kstat_named_init(&spa->spa_trimstats->st_bytes,
!                     "bytes", KSTAT_DATA_UINT64);
!                 kstat_named_init(&spa->spa_trimstats->st_extents_skipped,
!                     "extents_skipped", KSTAT_DATA_UINT64);
!                 kstat_named_init(&spa->spa_trimstats->st_bytes_skipped,
!                     "bytes_skipped", KSTAT_DATA_UINT64);
!                 kstat_named_init(&spa->spa_trimstats->st_auto_slow,
!                     "auto_slow", KSTAT_DATA_UINT64);
! #endif  /* _KERNEL */
! 
!                 kstat_install(spa->spa_trimstats_ks);
!         } else {
!                 cmn_err(CE_NOTE, "!Cannot create trim kstats for pool %s",
!                     spa->spa_name);
!         }
! }
! 
! /*
!  * Destroys the trim kstats for a spa.
   */
! static void
! spa_trimstats_destroy(spa_t *spa)
! {
!         if (spa->spa_trimstats_ks) {
!                 kstat_delete(spa->spa_trimstats_ks);
!                 spa->spa_trimstats = NULL;
!                 spa->spa_trimstats_ks = NULL;
          }
+ }
  
! /*
!  * Updates the numerical trim kstats for a spa.
!  */
! void
! spa_trimstats_update(spa_t *spa, uint64_t extents, uint64_t bytes,
!     uint64_t extents_skipped, uint64_t bytes_skipped)
! {
!         spa_trimstats_t *st = spa->spa_trimstats;
!         if (st) {
!                 atomic_add_64(&st->st_extents.value.ui64, extents);
!                 atomic_add_64(&st->st_bytes.value.ui64, bytes);
!                 atomic_add_64(&st->st_extents_skipped.value.ui64,
!                     extents_skipped);
!                 atomic_add_64(&st->st_bytes_skipped.value.ui64,
!                     bytes_skipped);
          }
! }
  
! /*
!  * Increments the slow-trim kstat for a spa.
!  */
! void
! spa_trimstats_auto_slow_incr(spa_t *spa)
! {
!         spa_trimstats_t *st = spa->spa_trimstats;
!         if (st)
!                 atomic_inc_64(&st->st_auto_slow.value.ui64);
! }
  
! /*
!  * Creates the taskq used for dispatching auto-trim. This is called only when
!  * the property is set to `on' or when the pool is loaded (and the autotrim
!  * property is `on').
!  */
! void
! spa_auto_trim_taskq_create(spa_t *spa)
! {
!         char name[MAXPATHLEN];
!         ASSERT(MUTEX_HELD(&spa->spa_auto_trim_lock));
!         ASSERT(spa->spa_auto_trim_taskq == NULL);
!         (void) snprintf(name, sizeof (name), "%s_auto_trim", spa->spa_name);
!         spa->spa_auto_trim_taskq = taskq_create(name, 1, minclsyspri, 1,
!             spa->spa_root_vdev->vdev_children, TASKQ_DYNAMIC);
!         VERIFY(spa->spa_auto_trim_taskq != NULL);
  }
  
! /*
!  * Creates the taskq for dispatching manual trim. This taskq is recreated
!  * each time `zpool trim <poolname>' is issued and destroyed after the run
!  * completes in an async spa request.
!  */
! void
! spa_man_trim_taskq_create(spa_t *spa)
  {
!         char name[MAXPATHLEN];
!         ASSERT(MUTEX_HELD(&spa->spa_man_trim_lock));
!         spa_async_unrequest(spa, SPA_ASYNC_MAN_TRIM_TASKQ_DESTROY);
!         if (spa->spa_man_trim_taskq != NULL)
!                 /*
!                  * The async taskq destroy has been pre-empted, so just
!                  * return, the taskq is still good to use.
!                  */
!                 return;
!         (void) snprintf(name, sizeof (name), "%s_man_trim", spa->spa_name);
!         spa->spa_man_trim_taskq = taskq_create(name, 1, minclsyspri, 1,
!             spa->spa_root_vdev->vdev_children, TASKQ_DYNAMIC);
!         VERIFY(spa->spa_man_trim_taskq != NULL);
  }
  
! /*
!  * Destroys the taskq created in spa_auto_trim_taskq_create. The taskq
!  * is only destroyed when the autotrim property is set to `off'.
!  */
! void
! spa_auto_trim_taskq_destroy(spa_t *spa)
  {
!         ASSERT(MUTEX_HELD(&spa->spa_auto_trim_lock));
!         ASSERT(spa->spa_auto_trim_taskq != NULL);
!         while (spa->spa_num_auto_trimming != 0)
!                 cv_wait(&spa->spa_auto_trim_done_cv, &spa->spa_auto_trim_lock);
!         taskq_destroy(spa->spa_auto_trim_taskq);
!         spa->spa_auto_trim_taskq = NULL;
  }
  
+ /*
+  * Destroys the taskq created in spa_man_trim_taskq_create. The taskq is
+  * destroyed after a manual trim run completes from an async spa request.
+  * There is a bit of lag between an async request being issued at the
+  * completion of a trim run and it finally being acted on, hence why this
+  * function checks if new manual trimming threads haven't been re-spawned.
+  * If they have, we assume the async spa request been preempted by another
+  * manual trim request and we back off.
+  */
  void
! spa_man_trim_taskq_destroy(spa_t *spa)
  {
!         ASSERT(MUTEX_HELD(&spa->spa_man_trim_lock));
!         ASSERT(spa->spa_man_trim_taskq != NULL);
!         if (spa->spa_num_man_trimming != 0)
!                 /* another trim got started before we got here, back off */
!                 return;
!         taskq_destroy(spa->spa_man_trim_taskq);
!         spa->spa_man_trim_taskq = NULL;
  }