Print this page
NEX-19592 zfs_dbgmsg should not contain info calculated latency
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Evan Layton <evan.layton@nexenta.com>
Reviewed by: Rick McNeal <rick.mcneal@nexenta.com>
NEX-17348 The ZFS deadman timer is currently set too high
Reviewed by: Evan Layton <evan.layton@nexenta.com>
Reviewed by: Rob Gittins  <rob.gittins@nexenta.com>
Reviewed by: Joyce McIntosh<joyce.macintosh@nexenta.com>
NEX-9200 Improve the scalability of attribute locking in zfs_zget
Reviewed by: Joyce McIntosh <joyce.mcintosh@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
NEX-13140 DVA-throttle support for special-class
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
NEX-9989 Changing volume names can result in double imports and data corruption
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
NEX-10069 ZFS_READONLY is a little too strict (fix test lint)
NEX-9553 Move ss_fill gap logic from scan algorithm into range_tree.c
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
NEX-6088 ZFS scrub/resilver take excessively long due to issuing lots of random IO
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
NEX-5856 ddt_capped isn't reset when deduped dataset is destroyed
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
NEX-5553 ZFS auto-trim, manual-trim and scrub can race and deadlock
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Rob Gittins <rob.gittins@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
NEX-5795 Rename 'wrc' as 'wbc' in the source and in the tech docs
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-5064 On-demand trim should store operation start and stop time
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-5188 Removed special-vdev causes panic on read or on get size of special-bp
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
NEX-5186 smf-tests contains built files and it shouldn't
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Steve Peng <steve.peng@nexenta.com>
NEX-5168 cleanup and productize non-default latency based writecache load-balancer
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-3729 KRRP changes mess up iostat(1M)
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
NEX-4807 writecache load-balancing statistics: several distinct problems, must be revisited and revised
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-4876 On-demand TRIM shouldn't use system_taskq and should queue jobs
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-4683 WRC: Special block pointer must know that it is special
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
NEX-4677 Fix for NEX-4619 build breakage
NEX-4620 ZFS autotrim triggering is unreliable
NEX-4622 On-demand TRIM code illogically enumerates metaslabs via mg_ms_tree
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Reviewed by: Hans Rosenfeld <hans.rosenfeld@nexenta.com>
NEX-4619 Want kstats to monitor TRIM and UNMAP operation
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Hans Rosenfeld <hans.rosenfeld@nexenta.com>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
4185 add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R (fix studio build)
4185 add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Approved by: Garrett D'Amore <garrett@damore.org>
5818 zfs {ref}compressratio is incorrect with 4k sector size
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Richard Elling <richard.elling@richardelling.com>
Reviewed by: Steven Hartland <killing@multiplay.co.uk>
Reviewed by: Don Brady <dev.fs.zfs@gmail.com>
Approved by: Albert Lee <trisk@omniti.com>
NEX-4476 WRC: Allow to use write back cache per tree of datasets
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
Revert "NEX-4476 WRC: Allow to use write back cache per tree of datasets"
This reverts commit fe97b74444278a6f36fec93179133641296312da.
NEX-4476 WRC: Allow to use write back cache per tree of datasets
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
NEX-4245 WRC: Code cleanup and refactoring to simplify merge with upstream
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
NEX-4203 spa_config_tryenter incorrectly handles the multiple-lock case
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
NEX-3965 System may panic on the importing of pool with WRC
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Revert "NEX-3965 System may panic on the importing of pool with WRC"
This reverts commit 45bc50222913cddafde94621d28b78d6efaea897.
NEX-3984 On-demand TRIM
Reviewed by: Alek Pinchuk <alek@nexenta.com>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Conflicts:
        usr/src/common/zfs/zpool_prop.c
        usr/src/uts/common/sys/fs/zfs.h
NEX-3965 System may panic on the importing of pool with WRC
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
NEX-3558 KRRP Integration
NEX-3508 CLONE - Port NEX-2946 Add UNMAP/TRIM functionality to ZFS and illumos
Reviewed by: Josef Sipek <josef.sipek@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Conflicts:
    usr/src/uts/common/io/scsi/targets/sd.c
    usr/src/uts/common/sys/scsi/targets/sddef.h
NEX-3165 need some dedup improvements
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
4391 panic system rather than corrupting pool if we hit bug 4390
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Approved by: Gordon Ross <gwr@nexenta.com>
4370 avoid transmitting holes during zfs send
4371 DMU code clean up
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net>
Approved by: Garrett D'Amore <garrett@damore.org>
OS-114 Heap leak when exporting/destroying pools with CoS
SUP-577 deadlock between zpool detach and syseventd
OS-80 support for vdev and CoS properties for the new I/O scheduler
OS-95 lint warning introduced by OS-61
Fixup merge results
re #13333 rb4362 - eliminated spa_update_iotime() to fix the stats
re #12643 rb4064 ZFS meta refactoring - vdev utilization tracking, auto-dedup
re #12585 rb4049 ZFS++ work port - refactoring to improve separation of open/closed code, bug fixes, performance improvements - open code
re #8346 rb2639 KT disk failures
Bug 11205: add missing libzfs_closed_stubs.c to fix opensource-only build.
ZFS plus work: special vdevs, cos, cos/vdev properties

*** 19,30 **** * CDDL HEADER END */ /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. * Copyright (c) 2011, 2017 by Delphix. All rights reserved. - * Copyright 2015 Nexenta Systems, Inc. All rights reserved. * Copyright (c) 2014 Spectra Logic Corporation, All rights reserved. * Copyright 2013 Saso Kiselkov. All rights reserved. * Copyright (c) 2014 Integros [integros.com] * Copyright (c) 2017 Datto Inc. */ --- 19,30 ---- * CDDL HEADER END */ /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. * Copyright (c) 2011, 2017 by Delphix. All rights reserved. * Copyright (c) 2014 Spectra Logic Corporation, All rights reserved. + * Copyright 2019 Nexenta Systems, Inc. All rights reserved. * Copyright 2013 Saso Kiselkov. All rights reserved. * Copyright (c) 2014 Integros [integros.com] * Copyright (c) 2017 Datto Inc. */
*** 50,59 **** --- 50,60 ---- #include <sys/dsl_scan.h> #include <sys/fs/zfs.h> #include <sys/metaslab_impl.h> #include <sys/arc.h> #include <sys/ddt.h> + #include <sys/cos.h> #include "zfs_prop.h" #include <sys/zfeature.h> /* * SPA locking
*** 224,233 **** --- 225,242 ---- * * spa_rename() is also implemented within this file since it requires * manipulation of the namespace. */ + struct spa_trimstats { + kstat_named_t st_extents; /* # of extents issued to zio */ + kstat_named_t st_bytes; /* # of bytes issued to zio */ + kstat_named_t st_extents_skipped; /* # of extents too small */ + kstat_named_t st_bytes_skipped; /* bytes in extents_skipped */ + kstat_named_t st_auto_slow; /* trim slow, exts dropped */ + }; + static avl_tree_t spa_namespace_avl; kmutex_t spa_namespace_lock; static kcondvar_t spa_namespace_cv; static int spa_active_count; int spa_max_replication_override = SPA_DVAS_PER_BP;
*** 239,257 **** kmem_cache_t *spa_buffer_pool; int spa_mode_global; #ifdef ZFS_DEBUG ! /* ! * Everything except dprintf, spa, and indirect_remap is on by default ! * in debug builds. ! */ ! int zfs_flags = ~(ZFS_DEBUG_DPRINTF | ZFS_DEBUG_SPA | ZFS_DEBUG_INDIRECT_REMAP); #else int zfs_flags = 0; #endif /* * zfs_recover can be set to nonzero to attempt to recover from * otherwise-fatal errors, typically caused by on-disk corruption. When * set, calls to zfs_panic_recover() will turn into warning messages. * This should only be used as a last resort, as it typically results --- 248,266 ---- kmem_cache_t *spa_buffer_pool; int spa_mode_global; #ifdef ZFS_DEBUG ! /* Everything except dprintf and spa is on by default in debug builds */ ! int zfs_flags = ~(ZFS_DEBUG_DPRINTF | ZFS_DEBUG_SPA); #else int zfs_flags = 0; #endif + #define ZFS_OBJ_MTX_DEFAULT_SZ 64 + uint64_t spa_obj_mtx_sz = ZFS_OBJ_MTX_DEFAULT_SZ; + /* * zfs_recover can be set to nonzero to attempt to recover from * otherwise-fatal errors, typically caused by on-disk corruption. When * set, calls to zfs_panic_recover() will turn into warning messages. * This should only be used as a last resort, as it typically results
*** 289,306 **** * leaking space in the "partial temporary" failure case. */ boolean_t zfs_free_leak_on_eio = B_FALSE; /* * Expiration time in milliseconds. This value has two meanings. First it is * used to determine when the spa_deadman() logic should fire. By default the ! * spa_deadman() will fire if spa_sync() has not completed in 1000 seconds. * Secondly, the value determines if an I/O is considered "hung". Any I/O that * has not completed in zfs_deadman_synctime_ms is considered "hung" resulting * in a system panic. */ ! uint64_t zfs_deadman_synctime_ms = 1000000ULL; /* * Check time in milliseconds. This defines the frequency at which we check * for hung I/O. */ --- 298,321 ---- * leaking space in the "partial temporary" failure case. */ boolean_t zfs_free_leak_on_eio = B_FALSE; /* + * alpha for spa_update_latency() rolling average of pool latency, which + * is updated on every txg commit. + */ + int64_t zfs_root_latency_alpha = 10; + + /* * Expiration time in milliseconds. This value has two meanings. First it is * used to determine when the spa_deadman() logic should fire. By default the ! * spa_deadman() will fire if spa_sync() has not completed in 250 seconds. * Secondly, the value determines if an I/O is considered "hung". Any I/O that * has not completed in zfs_deadman_synctime_ms is considered "hung" resulting * in a system panic. */ ! uint64_t zfs_deadman_synctime_ms = 250000ULL; /* * Check time in milliseconds. This defines the frequency at which we check * for hung I/O. */
*** 352,391 **** * See also the comments in zfs_space_check_t. */ int spa_slop_shift = 5; uint64_t spa_min_slop = 128 * 1024 * 1024; ! /*PRINTFLIKE2*/ ! void ! spa_load_failed(spa_t *spa, const char *fmt, ...) ! { ! va_list adx; ! char buf[256]; - va_start(adx, fmt); - (void) vsnprintf(buf, sizeof (buf), fmt, adx); - va_end(adx); - - zfs_dbgmsg("spa_load(%s, config %s): FAILED: %s", spa->spa_name, - spa->spa_trust_config ? "trusted" : "untrusted", buf); - } - - /*PRINTFLIKE2*/ - void - spa_load_note(spa_t *spa, const char *fmt, ...) - { - va_list adx; - char buf[256]; - - va_start(adx, fmt); - (void) vsnprintf(buf, sizeof (buf), fmt, adx); - va_end(adx); - - zfs_dbgmsg("spa_load(%s, config %s): %s", spa->spa_name, - spa->spa_trust_config ? "trusted" : "untrusted", buf); - } - /* * ========================================================================== * SPA config locking * ========================================================================== */ --- 367,379 ---- * See also the comments in zfs_space_check_t. */ int spa_slop_shift = 5; uint64_t spa_min_slop = 128 * 1024 * 1024; ! static void spa_trimstats_create(spa_t *spa); ! static void spa_trimstats_destroy(spa_t *spa); /* * ========================================================================== * SPA config locking * ========================================================================== */
*** 474,484 **** scl->scl_writer = curthread; } (void) refcount_add(&scl->scl_count, tag); mutex_exit(&scl->scl_lock); } ! ASSERT3U(wlocks_held, <=, locks); } void spa_config_exit(spa_t *spa, int locks, void *tag) { --- 462,472 ---- scl->scl_writer = curthread; } (void) refcount_add(&scl->scl_count, tag); mutex_exit(&scl->scl_lock); } ! ASSERT(wlocks_held <= locks); } void spa_config_exit(spa_t *spa, int locks, void *tag) {
*** 585,594 **** --- 573,583 ---- { spa_t *spa; spa_config_dirent_t *dp; cyc_handler_t hdlr; cyc_time_t when; + uint64_t guid; ASSERT(MUTEX_HELD(&spa_namespace_lock)); spa = kmem_zalloc(sizeof (spa_t), KM_SLEEP);
*** 602,618 **** mutex_init(&spa->spa_cksum_tmpls_lock, NULL, MUTEX_DEFAULT, NULL); mutex_init(&spa->spa_scrub_lock, NULL, MUTEX_DEFAULT, NULL); mutex_init(&spa->spa_suspend_lock, NULL, MUTEX_DEFAULT, NULL); mutex_init(&spa->spa_vdev_top_lock, NULL, MUTEX_DEFAULT, NULL); mutex_init(&spa->spa_iokstat_lock, NULL, MUTEX_DEFAULT, NULL); ! mutex_init(&spa->spa_alloc_lock, NULL, MUTEX_DEFAULT, NULL); cv_init(&spa->spa_async_cv, NULL, CV_DEFAULT, NULL); cv_init(&spa->spa_evicting_os_cv, NULL, CV_DEFAULT, NULL); cv_init(&spa->spa_proc_cv, NULL, CV_DEFAULT, NULL); cv_init(&spa->spa_scrub_io_cv, NULL, CV_DEFAULT, NULL); cv_init(&spa->spa_suspend_cv, NULL, CV_DEFAULT, NULL); for (int t = 0; t < TXG_SIZE; t++) bplist_create(&spa->spa_free_bplist[t]); (void) strlcpy(spa->spa_name, name, sizeof (spa->spa_name)); --- 591,615 ---- mutex_init(&spa->spa_cksum_tmpls_lock, NULL, MUTEX_DEFAULT, NULL); mutex_init(&spa->spa_scrub_lock, NULL, MUTEX_DEFAULT, NULL); mutex_init(&spa->spa_suspend_lock, NULL, MUTEX_DEFAULT, NULL); mutex_init(&spa->spa_vdev_top_lock, NULL, MUTEX_DEFAULT, NULL); mutex_init(&spa->spa_iokstat_lock, NULL, MUTEX_DEFAULT, NULL); ! mutex_init(&spa->spa_cos_props_lock, NULL, MUTEX_DEFAULT, NULL); ! mutex_init(&spa->spa_vdev_props_lock, NULL, MUTEX_DEFAULT, NULL); ! mutex_init(&spa->spa_perfmon.perfmon_lock, NULL, MUTEX_DEFAULT, NULL); + mutex_init(&spa->spa_auto_trim_lock, NULL, MUTEX_DEFAULT, NULL); + mutex_init(&spa->spa_man_trim_lock, NULL, MUTEX_DEFAULT, NULL); + cv_init(&spa->spa_async_cv, NULL, CV_DEFAULT, NULL); cv_init(&spa->spa_evicting_os_cv, NULL, CV_DEFAULT, NULL); cv_init(&spa->spa_proc_cv, NULL, CV_DEFAULT, NULL); cv_init(&spa->spa_scrub_io_cv, NULL, CV_DEFAULT, NULL); cv_init(&spa->spa_suspend_cv, NULL, CV_DEFAULT, NULL); + cv_init(&spa->spa_auto_trim_done_cv, NULL, CV_DEFAULT, NULL); + cv_init(&spa->spa_man_trim_update_cv, NULL, CV_DEFAULT, NULL); + cv_init(&spa->spa_man_trim_done_cv, NULL, CV_DEFAULT, NULL); for (int t = 0; t < TXG_SIZE; t++) bplist_create(&spa->spa_free_bplist[t]); (void) strlcpy(spa->spa_name, name, sizeof (spa->spa_name));
*** 620,631 **** spa->spa_freeze_txg = UINT64_MAX; spa->spa_final_txg = UINT64_MAX; spa->spa_load_max_txg = UINT64_MAX; spa->spa_proc = &p0; spa->spa_proc_state = SPA_PROC_NONE; ! spa->spa_trust_config = B_TRUE; hdlr.cyh_func = spa_deadman; hdlr.cyh_arg = spa; hdlr.cyh_level = CY_LOW_LEVEL; spa->spa_deadman_synctime = MSEC2NSEC(zfs_deadman_synctime_ms); --- 617,643 ---- spa->spa_freeze_txg = UINT64_MAX; spa->spa_final_txg = UINT64_MAX; spa->spa_load_max_txg = UINT64_MAX; spa->spa_proc = &p0; spa->spa_proc_state = SPA_PROC_NONE; ! if (spa_obj_mtx_sz < 1 || spa_obj_mtx_sz > INT_MAX) ! spa->spa_obj_mtx_sz = ZFS_OBJ_MTX_DEFAULT_SZ; ! else ! spa->spa_obj_mtx_sz = spa_obj_mtx_sz; + /* + * Grabbing the guid here is just so that spa_config_guid_exists can + * check early on to protect against doubled imports of the same pool + * under different names. If the GUID isn't provided here, we will + * let spa generate one later on during spa_load, although in that + * case we might not be able to provide the double-import protection. + */ + if (nvlist_lookup_uint64(config, ZPOOL_CONFIG_POOL_GUID, &guid) == 0) { + spa->spa_config_guid = guid; + ASSERT(!spa_config_guid_exists(guid)); + } + hdlr.cyh_func = spa_deadman; hdlr.cyh_arg = spa; hdlr.cyh_level = CY_LOW_LEVEL; spa->spa_deadman_synctime = MSEC2NSEC(zfs_deadman_synctime_ms);
*** 653,665 **** if (altroot) { spa->spa_root = spa_strdup(altroot); spa_active_count++; } - avl_create(&spa->spa_alloc_tree, zio_bookmark_compare, - sizeof (zio_t), offsetof(zio_t, io_alloc_node)); - /* * Every pool starts with the default cachefile */ list_create(&spa->spa_config_list, sizeof (spa_config_dirent_t), offsetof(spa_config_dirent_t, scd_link)); --- 665,674 ----
*** 687,706 **** VERIFY(nvlist_alloc(&spa->spa_label_features, NV_UNIQUE_NAME, KM_SLEEP) == 0); } spa->spa_iokstat = kstat_create("zfs", 0, name, ! "disk", KSTAT_TYPE_IO, 1, 0); if (spa->spa_iokstat) { spa->spa_iokstat->ks_lock = &spa->spa_iokstat_lock; kstat_install(spa->spa_iokstat); } spa->spa_debug = ((zfs_flags & ZFS_DEBUG_SPA) != 0); spa->spa_min_ashift = INT_MAX; spa->spa_max_ashift = 0; /* * As a pool is being created, treat all features as disabled by * setting SPA_FEATURE_DISABLED for all entries in the feature * refcount cache. --- 696,724 ---- VERIFY(nvlist_alloc(&spa->spa_label_features, NV_UNIQUE_NAME, KM_SLEEP) == 0); } spa->spa_iokstat = kstat_create("zfs", 0, name, ! "zfs", KSTAT_TYPE_IO, 1, 0); if (spa->spa_iokstat) { spa->spa_iokstat->ks_lock = &spa->spa_iokstat_lock; kstat_install(spa->spa_iokstat); } + spa_trimstats_create(spa); + spa->spa_debug = ((zfs_flags & ZFS_DEBUG_SPA) != 0); + autosnap_init(spa); + + spa_cos_init(spa); + + spa_special_init(spa); + spa->spa_min_ashift = INT_MAX; spa->spa_max_ashift = 0; + wbc_init(&spa->spa_wbc, spa); /* * As a pool is being created, treat all features as disabled by * setting SPA_FEATURE_DISABLED for all entries in the feature * refcount cache.
*** 741,753 **** if (dp->scd_path != NULL) spa_strfree(dp->scd_path); kmem_free(dp, sizeof (spa_config_dirent_t)); } - avl_destroy(&spa->spa_alloc_tree); list_destroy(&spa->spa_config_list); nvlist_free(spa->spa_label_features); nvlist_free(spa->spa_load_info); spa_config_set(spa, NULL); mutex_enter(&cpu_lock); --- 759,778 ---- if (dp->scd_path != NULL) spa_strfree(dp->scd_path); kmem_free(dp, sizeof (spa_config_dirent_t)); } list_destroy(&spa->spa_config_list); + wbc_fini(&spa->spa_wbc); + + spa_special_fini(spa); + + spa_cos_fini(spa); + + autosnap_fini(spa); + nvlist_free(spa->spa_label_features); nvlist_free(spa->spa_load_info); spa_config_set(spa, NULL); mutex_enter(&cpu_lock);
*** 758,767 **** --- 783,794 ---- refcount_destroy(&spa->spa_refcount); spa_config_lock_destroy(spa); + spa_trimstats_destroy(spa); + kstat_delete(spa->spa_iokstat); spa->spa_iokstat = NULL; for (int t = 0; t < TXG_SIZE; t++) bplist_destroy(&spa->spa_free_bplist[t]);
*** 771,782 **** cv_destroy(&spa->spa_async_cv); cv_destroy(&spa->spa_evicting_os_cv); cv_destroy(&spa->spa_proc_cv); cv_destroy(&spa->spa_scrub_io_cv); cv_destroy(&spa->spa_suspend_cv); - mutex_destroy(&spa->spa_alloc_lock); mutex_destroy(&spa->spa_async_lock); mutex_destroy(&spa->spa_errlist_lock); mutex_destroy(&spa->spa_errlog_lock); mutex_destroy(&spa->spa_evicting_os_lock); mutex_destroy(&spa->spa_history_lock); --- 798,811 ---- cv_destroy(&spa->spa_async_cv); cv_destroy(&spa->spa_evicting_os_cv); cv_destroy(&spa->spa_proc_cv); cv_destroy(&spa->spa_scrub_io_cv); cv_destroy(&spa->spa_suspend_cv); + cv_destroy(&spa->spa_auto_trim_done_cv); + cv_destroy(&spa->spa_man_trim_update_cv); + cv_destroy(&spa->spa_man_trim_done_cv); mutex_destroy(&spa->spa_async_lock); mutex_destroy(&spa->spa_errlist_lock); mutex_destroy(&spa->spa_errlog_lock); mutex_destroy(&spa->spa_evicting_os_lock); mutex_destroy(&spa->spa_history_lock);
*** 785,794 **** --- 814,827 ---- mutex_destroy(&spa->spa_cksum_tmpls_lock); mutex_destroy(&spa->spa_scrub_lock); mutex_destroy(&spa->spa_suspend_lock); mutex_destroy(&spa->spa_vdev_top_lock); mutex_destroy(&spa->spa_iokstat_lock); + mutex_destroy(&spa->spa_cos_props_lock); + mutex_destroy(&spa->spa_vdev_props_lock); + mutex_destroy(&spa->spa_auto_trim_lock); + mutex_destroy(&spa->spa_man_trim_lock); kmem_free(spa, sizeof (spa_t)); } /*
*** 1108,1117 **** --- 1141,1153 ---- uint64_t spa_vdev_enter(spa_t *spa) { mutex_enter(&spa->spa_vdev_top_lock); mutex_enter(&spa_namespace_lock); + mutex_enter(&spa->spa_auto_trim_lock); + mutex_enter(&spa->spa_man_trim_lock); + spa_trim_stop_wait(spa); return (spa_vdev_config_enter(spa)); } /* * Internal implementation for spa_vdev_enter(). Used when a vdev
*** 1156,1165 **** --- 1192,1202 ---- /* * Verify the metaslab classes. */ ASSERT(metaslab_class_validate(spa_normal_class(spa)) == 0); ASSERT(metaslab_class_validate(spa_log_class(spa)) == 0); + ASSERT(metaslab_class_validate(spa_special_class(spa)) == 0); spa_config_exit(spa, SCL_ALL, spa); /* * Panic the system if the specified tag requires it. This
*** 1186,1196 **** /* * If the config changed, update the config cache. */ if (config_changed) ! spa_write_cachefile(spa, B_FALSE, B_TRUE); } /* * Unlock the spa_t after adding or removing a vdev. Besides undoing the * locking of spa_vdev_enter(), we also want make sure the transactions have --- 1223,1233 ---- /* * If the config changed, update the config cache. */ if (config_changed) ! spa_config_sync(spa, B_FALSE, B_TRUE); } /* * Unlock the spa_t after adding or removing a vdev. Besides undoing the * locking of spa_vdev_enter(), we also want make sure the transactions have
*** 1199,1208 **** --- 1236,1247 ---- */ int spa_vdev_exit(spa_t *spa, vdev_t *vd, uint64_t txg, int error) { spa_vdev_config_exit(spa, vd, txg, error, FTAG); + mutex_exit(&spa->spa_man_trim_lock); + mutex_exit(&spa->spa_auto_trim_lock); mutex_exit(&spa_namespace_lock); mutex_exit(&spa->spa_vdev_top_lock); return (error); }
*** 1270,1280 **** /* * If the config changed, update the config cache. */ if (config_changed) { mutex_enter(&spa_namespace_lock); ! spa_write_cachefile(spa, B_FALSE, B_TRUE); mutex_exit(&spa_namespace_lock); } return (error); } --- 1309,1319 ---- /* * If the config changed, update the config cache. */ if (config_changed) { mutex_enter(&spa_namespace_lock); ! spa_config_sync(spa, B_FALSE, B_TRUE); mutex_exit(&spa_namespace_lock); } return (error); }
*** 1348,1358 **** txg_wait_synced(spa->spa_dsl_pool, 0); /* * Sync the updated config cache. */ ! spa_write_cachefile(spa, B_FALSE, B_TRUE); spa_close(spa, FTAG); mutex_exit(&spa_namespace_lock); --- 1387,1397 ---- txg_wait_synced(spa->spa_dsl_pool, 0); /* * Sync the updated config cache. */ ! spa_config_sync(spa, B_FALSE, B_TRUE); spa_close(spa, FTAG); mutex_exit(&spa_namespace_lock);
*** 1406,1415 **** --- 1445,1483 ---- spa_guid_exists(uint64_t pool_guid, uint64_t device_guid) { return (spa_by_guid(pool_guid, device_guid) != NULL); } + /* + * Similar to spa_guid_exists, but uses the spa_config_guid and doesn't + * filter the check by pool state (as spa_guid_exists does). This is + * used to protect against attempting to spa_add the same pool (with the + * same pool GUID) under different names. This situation can happen if + * the boot_archive contains an outdated zpool.cache file after a pool + * rename. That would make us import the pool twice, resulting in data + * corruption. Normally the boot_archive shouldn't contain a zpool.cache + * file, but if due to misconfiguration it does, this function serves as + * a failsafe to prevent the double import. + */ + boolean_t + spa_config_guid_exists(uint64_t pool_guid) + { + spa_t *spa; + + ASSERT(MUTEX_HELD(&spa_namespace_lock)); + if (pool_guid == 0) + return (B_FALSE); + + for (spa = avl_first(&spa_namespace_avl); spa != NULL; + spa = AVL_NEXT(&spa_namespace_avl, spa)) { + if (spa->spa_config_guid == pool_guid) + return (B_TRUE); + } + + return (B_FALSE); + } + char * spa_strdup(const char *s) { size_t len; char *new;
*** 1564,1579 **** spa_is_initializing(spa_t *spa) { return (spa->spa_is_initializing); } - boolean_t - spa_indirect_vdevs_loaded(spa_t *spa) - { - return (spa->spa_indirect_vdevs_loaded); - } - blkptr_t * spa_get_rootblkptr(spa_t *spa) { return (&spa->spa_ubsync.ub_rootbp); } --- 1632,1641 ----
*** 1696,1705 **** --- 1758,1811 ---- { return (lsize * spa_asize_inflation); } /* + * Get either on disk (phys == B_TRUE) or possible in core DDT size + */ + uint64_t + spa_get_ddts_size(spa_t *spa, boolean_t phys) + { + if (phys) + return (spa->spa_ddt_dsize); + + return (spa->spa_ddt_msize); + } + + /* + * Check to see if we need to stop DDT growth to stay within some limit + */ + boolean_t + spa_enable_dedup_cap(spa_t *spa) + { + if (zfs_ddt_byte_ceiling != 0) { + if (zfs_ddts_msize > zfs_ddt_byte_ceiling) { + /* need to limit DDT to an in core bytecount */ + return (B_TRUE); + } + } else if (zfs_ddt_limit_type == DDT_LIMIT_TO_ARC) { + if (zfs_ddts_msize > *arc_ddt_evict_threshold) { + /* need to limit DDT to fit into ARC */ + return (B_TRUE); + } + } else if (zfs_ddt_limit_type == DDT_LIMIT_TO_L2ARC) { + if (spa->spa_l2arc_ddt_devs_size != 0) { + if (spa_get_ddts_size(spa, B_TRUE) > + spa->spa_l2arc_ddt_devs_size) { + /* limit DDT to fit into L2ARC DDT dev */ + return (B_TRUE); + } + } else if (zfs_ddts_msize > *arc_ddt_evict_threshold) { + /* no L2ARC DDT dev - limit DDT to fit into ARC */ + return (B_TRUE); + } + } + + return (B_FALSE); + } + + /* * Return the amount of slop space in bytes. It is 1/32 of the pool (3.2%), * or at least 128MB, unless that would cause it to be more than half the * pool size. * * See the comment above spa_slop_shift for details.
*** 1720,1749 **** void spa_update_dspace(spa_t *spa) { spa->spa_dspace = metaslab_class_get_dspace(spa_normal_class(spa)) + ddt_get_dedup_dspace(spa); ! if (spa->spa_vdev_removal != NULL) { /* ! * We can't allocate from the removing device, so ! * subtract its size. This prevents the DMU/DSL from ! * filling up the (now smaller) pool while we are in the ! * middle of removing the device. ! * ! * Note that the DMU/DSL doesn't actually know or care ! * how much space is allocated (it does its own tracking ! * of how much space has been logically used). So it ! * doesn't matter that the data we are moving may be ! * allocated twice (on the old device and the new ! * device). */ ! vdev_t *vd = spa->spa_vdev_removal->svr_vdev; ! spa->spa_dspace -= spa_deflate(spa) ? ! vd->vdev_stat.vs_dspace : vd->vdev_stat.vs_space; } } /* * Return the failure mode that has been set to this pool. The default * behavior will be to block all I/Os when a complete failure occurs. */ uint8_t --- 1826,1880 ---- void spa_update_dspace(spa_t *spa) { spa->spa_dspace = metaslab_class_get_dspace(spa_normal_class(spa)) + ddt_get_dedup_dspace(spa); ! } ! ! /* ! * EXPERIMENTAL ! * Use exponential moving average to track root vdev iotime, as well as top ! * level vdev iotime. ! * The principle: avg_new = avg_prev + (cur - avg_prev) * a / 100; a is ! * tuneable. For example, if a = 10 (alpha = 0.1), it will take 20 iterations, ! * or 100 seconds at 5 second txg commit intervals for the values from last 20 ! * iterations to account for 66% of the moving average. ! * Currently, the challenge is that we keep track of iotime in cumulative ! * nanoseconds since zpool import, both for leaf and top vdevs, so a way of ! * getting delta pre/post txg commit is required. ! */ ! ! void ! spa_update_latency(spa_t *spa) ! { ! vdev_t *rvd = spa->spa_root_vdev; ! vdev_stat_t *rvs = &rvd->vdev_stat; ! for (int c = 0; c < rvd->vdev_children; c++) { ! vdev_t *cvd = rvd->vdev_child[c]; ! vdev_stat_t *cvs = &cvd->vdev_stat; ! mutex_enter(&rvd->vdev_stat_lock); ! ! for (int t = 0; t < ZIO_TYPES; t++) { ! /* ! * Non-trivial bit here. We update the moving latency ! * average for each child vdev separately, but since we ! * want the average to settle at the same rate ! * regardless of top level vdev count, we effectively ! * divide our alpha by number of children of the root ! * vdev to account for that. */ ! rvs->vs_latency[t] += ((((int64_t)cvs->vs_latency[t] - ! (int64_t)rvs->vs_latency[t]) * ! (int64_t)zfs_root_latency_alpha) / 100) / ! (int64_t)(rvd->vdev_children); } + mutex_exit(&rvd->vdev_stat_lock); + } } + /* * Return the failure mode that has been set to this pool. The default * behavior will be to block all I/Os when a complete failure occurs. */ uint8_t
*** 1762,1771 **** --- 1893,1908 ---- spa_version(spa_t *spa) { return (spa->spa_ubsync.ub_version); } + int + spa_get_obj_mtx_sz(spa_t *spa) + { + return (spa->spa_obj_mtx_sz); + } + boolean_t spa_deflate(spa_t *spa) { return (spa->spa_deflate); }
*** 1780,1789 **** --- 1917,1932 ---- spa_log_class(spa_t *spa) { return (spa->spa_log_class); } + metaslab_class_t * + spa_special_class(spa_t *spa) + { + return (spa->spa_special_class); + } + void spa_evicting_os_register(spa_t *spa, objset_t *os) { mutex_enter(&spa->spa_evicting_os_lock); list_insert_head(&spa->spa_evicting_os_list, os);
*** 1808,1817 **** --- 1951,1970 ---- mutex_exit(&spa->spa_evicting_os_lock); dmu_buf_user_evict_wait(); } + uint64_t + spa_class_alloc_percentage(metaslab_class_t *mc) + { + uint64_t capacity = mc->mc_space; + uint64_t alloc = mc->mc_alloc; + uint64_t one_percent = capacity / 100; + + return (alloc / one_percent); + } + int spa_max_replication(spa_t *spa) { /* * As of SPA_VERSION == SPA_VERSION_DITTO_BLOCKS, we are able to
*** 1833,1842 **** --- 1986,2007 ---- spa_deadman_synctime(spa_t *spa) { return (spa->spa_deadman_synctime); } + spa_force_trim_t + spa_get_force_trim(spa_t *spa) + { + return (spa->spa_force_trim); + } + + spa_auto_trim_t + spa_get_auto_trim(spa_t *spa) + { + return (spa->spa_auto_trim); + } + uint64_t dva_get_dsize_sync(spa_t *spa, const dva_t *dva) { uint64_t asize = DVA_GET_ASIZE(dva); uint64_t dsize = asize;
*** 1849,1878 **** } return (dsize); } uint64_t bp_get_dsize_sync(spa_t *spa, const blkptr_t *bp) { uint64_t dsize = 0; ! for (int d = 0; d < BP_GET_NDVAS(bp); d++) dsize += dva_get_dsize_sync(spa, &bp->blk_dva[d]); return (dsize); } uint64_t bp_get_dsize(spa_t *spa, const blkptr_t *bp) { ! uint64_t dsize = 0; spa_config_enter(spa, SCL_VDEV, FTAG, RW_READER); ! for (int d = 0; d < BP_GET_NDVAS(bp); d++) ! dsize += dva_get_dsize_sync(spa, &bp->blk_dva[d]); spa_config_exit(spa, SCL_VDEV, FTAG); return (dsize); } --- 2014,2054 ---- } return (dsize); } + /* + * This function walks over the all DVAs of the given BP and + * adds up their sizes. + */ uint64_t bp_get_dsize_sync(spa_t *spa, const blkptr_t *bp) { + /* + * SPECIAL-BP has two DVAs, but DVA[0] in this case is a + * temporary DVA, and after migration only the DVA[1] + * contains valid data. Therefore, we start walking for + * these BPs from DVA[1]. + */ + int start_dva = BP_IS_SPECIAL(bp) ? 1 : 0; uint64_t dsize = 0; ! for (int d = start_dva; d < BP_GET_NDVAS(bp); d++) { dsize += dva_get_dsize_sync(spa, &bp->blk_dva[d]); + } return (dsize); } uint64_t bp_get_dsize(spa_t *spa, const blkptr_t *bp) { ! uint64_t dsize; spa_config_enter(spa, SCL_VDEV, FTAG, RW_READER); ! dsize = bp_get_dsize_sync(spa, bp); spa_config_exit(spa, SCL_VDEV, FTAG); return (dsize); }
*** 1927,1936 **** --- 2103,2120 ---- avl_create(&spa_l2cache_avl, spa_l2cache_compare, sizeof (spa_aux_t), offsetof(spa_aux_t, aux_avl)); spa_mode_global = mode; + /* + * logevent_max_q_sz from log_sysevent.c gives us upper bound on + * the number of taskq entries; queueing of sysevents is serialized, + * so there is no need for more than one worker thread + */ + spa_sysevent_taskq = taskq_create("spa_sysevent_tq", 1, + minclsyspri, 1, 5000, TASKQ_DYNAMIC); + #ifdef _KERNEL spa_arch_init(); #else if (spa_mode_global != FREAD && dprintf_find_string("watch")) { arc_procfd = open("/proc/self/ctl", O_WRONLY);
*** 1952,1968 **** --- 2136,2158 ---- zil_init(); vdev_cache_stat_init(); zfs_prop_init(); zpool_prop_init(); zpool_feature_init(); + vdev_prop_init(); + cos_prop_init(); spa_config_load(); l2arc_start(); + ddt_init(); + dsl_scan_global_init(); } void spa_fini(void) { + ddt_fini(); + l2arc_stop(); spa_evict_all(); vdev_cache_stat_fini();
*** 1972,1981 **** --- 2162,2173 ---- metaslab_alloc_trace_fini(); range_tree_fini(); unique_fini(); refcount_fini(); + taskq_destroy(spa_sysevent_taskq); + avl_destroy(&spa_namespace_avl); avl_destroy(&spa_spare_avl); avl_destroy(&spa_l2cache_avl); cv_destroy(&spa_namespace_cv);
*** 2014,2024 **** } boolean_t spa_writeable(spa_t *spa) { ! return (!!(spa->spa_mode & FWRITE) && spa->spa_trust_config); } /* * Returns true if there is a pending sync task in any of the current * syncing txg, the current quiescing txg, or the current open txg. --- 2206,2216 ---- } boolean_t spa_writeable(spa_t *spa) { ! return (!!(spa->spa_mode & FWRITE)); } /* * Returns true if there is a pending sync task in any of the current * syncing txg, the current quiescing txg, or the current open txg.
*** 2027,2036 **** --- 2219,2234 ---- spa_has_pending_synctask(spa_t *spa) { return (!txg_all_lists_empty(&spa->spa_dsl_pool->dp_sync_tasks)); } + boolean_t + spa_has_special(spa_t *spa) + { + return (spa->spa_special_class->mc_rotor != NULL); + } + int spa_mode(spa_t *spa) { return (spa->spa_mode); }
*** 2071,2080 **** --- 2269,2279 ---- spa->spa_scan_pass_scrub_pause = spa->spa_scan_pass_start; else spa->spa_scan_pass_scrub_pause = 0; spa->spa_scan_pass_scrub_spent_paused = 0; spa->spa_scan_pass_exam = 0; + spa->spa_scan_pass_work = 0; vdev_scan_stat_init(spa->spa_root_vdev); } /* * Get scan stats for zpool status reports
*** 2096,2109 **** --- 2295,2312 ---- ps->pss_examined = scn->scn_phys.scn_examined; ps->pss_to_process = scn->scn_phys.scn_to_process; ps->pss_processed = scn->scn_phys.scn_processed; ps->pss_errors = scn->scn_phys.scn_errors; ps->pss_state = scn->scn_phys.scn_state; + mutex_enter(&scn->scn_status_lock); + ps->pss_issued = scn->scn_bytes_issued; + mutex_exit(&scn->scn_status_lock); /* data not stored on disk */ ps->pss_pass_start = spa->spa_scan_pass_start; ps->pss_pass_exam = spa->spa_scan_pass_exam; + ps->pss_pass_work = spa->spa_scan_pass_work; ps->pss_pass_scrub_pause = spa->spa_scan_pass_scrub_pause; ps->pss_pass_scrub_spent_paused = spa->spa_scan_pass_scrub_spent_paused; return (0); }
*** 2121,2184 **** return (SPA_MAXBLOCKSIZE); else return (SPA_OLD_MAXBLOCKSIZE); } /* ! * Returns the txg that the last device removal completed. No indirect mappings ! * have been added since this txg. */ ! uint64_t ! spa_get_last_removal_txg(spa_t *spa) { ! uint64_t vdevid; ! uint64_t ret = -1ULL; ! spa_config_enter(spa, SCL_VDEV, FTAG, RW_READER); ! /* ! * sr_prev_indirect_vdev is only modified while holding all the ! * config locks, so it is sufficient to hold SCL_VDEV as reader when ! * examining it. ! */ ! vdevid = spa->spa_removing_phys.sr_prev_indirect_vdev; ! while (vdevid != -1ULL) { ! vdev_t *vd = vdev_lookup_top(spa, vdevid); ! vdev_indirect_births_t *vib = vd->vdev_indirect_births; ! ASSERT3P(vd->vdev_ops, ==, &vdev_indirect_ops); ! /* ! * If the removal did not remap any data, we don't care. */ ! if (vdev_indirect_births_count(vib) != 0) { ! ret = vdev_indirect_births_last_entry_txg(vib); ! break; } ! vdevid = vd->vdev_indirect_config.vic_prev_indirect_vdev; } ! spa_config_exit(spa, SCL_VDEV, FTAG); ! IMPLY(ret != -1ULL, ! spa_feature_is_active(spa, SPA_FEATURE_DEVICE_REMOVAL)); ! return (ret); } ! boolean_t ! spa_trust_config(spa_t *spa) { ! return (spa->spa_trust_config); } ! uint64_t ! spa_missing_tvds_allowed(spa_t *spa) { ! return (spa->spa_missing_tvds_allowed); } void ! spa_set_missing_tvds(spa_t *spa, uint64_t missing) { ! spa->spa_missing_tvds = missing; } --- 2324,2530 ---- return (SPA_MAXBLOCKSIZE); else return (SPA_OLD_MAXBLOCKSIZE); } + boolean_t + spa_wbc_present(spa_t *spa) + { + return (spa->spa_wbc_mode != WBC_MODE_OFF); + } + + boolean_t + spa_wbc_active(spa_t *spa) + { + return (spa->spa_wbc_mode == WBC_MODE_ACTIVE); + } + + int + spa_wbc_mode(const char *name) + { + int ret = 0; + spa_t *spa; + + mutex_enter(&spa_namespace_lock); + spa = spa_lookup(name); + if (!spa) { + mutex_exit(&spa_namespace_lock); + return (-1); + } + + ret = (int)spa->spa_wbc_mode; + mutex_exit(&spa_namespace_lock); + return (ret); + } + + struct zfs_autosnap * + spa_get_autosnap(spa_t *spa) + { + return (&spa->spa_autosnap); + } + + wbc_data_t * + spa_get_wbc_data(spa_t *spa) + { + return (&spa->spa_wbc); + } + /* ! * Creates the trim kstats structure for a spa. */ ! static void ! spa_trimstats_create(spa_t *spa) { ! /* truncate pool name to accomodate "_trimstats" suffix */ ! char short_spa_name[KSTAT_STRLEN - 10]; ! char name[KSTAT_STRLEN]; ! ASSERT3P(spa->spa_trimstats, ==, NULL); ! ASSERT3P(spa->spa_trimstats_ks, ==, NULL); ! (void) snprintf(short_spa_name, sizeof (short_spa_name), "%s", ! spa->spa_name); ! (void) snprintf(name, sizeof (name), "%s_trimstats", short_spa_name); ! spa->spa_trimstats_ks = kstat_create("zfs", 0, name, "misc", ! KSTAT_TYPE_NAMED, sizeof (*spa->spa_trimstats) / ! sizeof (kstat_named_t), 0); ! if (spa->spa_trimstats_ks) { ! spa->spa_trimstats = spa->spa_trimstats_ks->ks_data; ! #ifdef _KERNEL ! kstat_named_init(&spa->spa_trimstats->st_extents, ! "extents", KSTAT_DATA_UINT64); ! kstat_named_init(&spa->spa_trimstats->st_bytes, ! "bytes", KSTAT_DATA_UINT64); ! kstat_named_init(&spa->spa_trimstats->st_extents_skipped, ! "extents_skipped", KSTAT_DATA_UINT64); ! kstat_named_init(&spa->spa_trimstats->st_bytes_skipped, ! "bytes_skipped", KSTAT_DATA_UINT64); ! kstat_named_init(&spa->spa_trimstats->st_auto_slow, ! "auto_slow", KSTAT_DATA_UINT64); ! #endif /* _KERNEL */ ! ! kstat_install(spa->spa_trimstats_ks); ! } else { ! cmn_err(CE_NOTE, "!Cannot create trim kstats for pool %s", ! spa->spa_name); ! } ! } ! ! /* ! * Destroys the trim kstats for a spa. */ ! static void ! spa_trimstats_destroy(spa_t *spa) ! { ! if (spa->spa_trimstats_ks) { ! kstat_delete(spa->spa_trimstats_ks); ! spa->spa_trimstats = NULL; ! spa->spa_trimstats_ks = NULL; } + } ! /* ! * Updates the numerical trim kstats for a spa. ! */ ! void ! spa_trimstats_update(spa_t *spa, uint64_t extents, uint64_t bytes, ! uint64_t extents_skipped, uint64_t bytes_skipped) ! { ! spa_trimstats_t *st = spa->spa_trimstats; ! if (st) { ! atomic_add_64(&st->st_extents.value.ui64, extents); ! atomic_add_64(&st->st_bytes.value.ui64, bytes); ! atomic_add_64(&st->st_extents_skipped.value.ui64, ! extents_skipped); ! atomic_add_64(&st->st_bytes_skipped.value.ui64, ! bytes_skipped); } ! } ! /* ! * Increments the slow-trim kstat for a spa. ! */ ! void ! spa_trimstats_auto_slow_incr(spa_t *spa) ! { ! spa_trimstats_t *st = spa->spa_trimstats; ! if (st) ! atomic_inc_64(&st->st_auto_slow.value.ui64); ! } ! /* ! * Creates the taskq used for dispatching auto-trim. This is called only when ! * the property is set to `on' or when the pool is loaded (and the autotrim ! * property is `on'). ! */ ! void ! spa_auto_trim_taskq_create(spa_t *spa) ! { ! char name[MAXPATHLEN]; ! ASSERT(MUTEX_HELD(&spa->spa_auto_trim_lock)); ! ASSERT(spa->spa_auto_trim_taskq == NULL); ! (void) snprintf(name, sizeof (name), "%s_auto_trim", spa->spa_name); ! spa->spa_auto_trim_taskq = taskq_create(name, 1, minclsyspri, 1, ! spa->spa_root_vdev->vdev_children, TASKQ_DYNAMIC); ! VERIFY(spa->spa_auto_trim_taskq != NULL); } ! /* ! * Creates the taskq for dispatching manual trim. This taskq is recreated ! * each time `zpool trim <poolname>' is issued and destroyed after the run ! * completes in an async spa request. ! */ ! void ! spa_man_trim_taskq_create(spa_t *spa) { ! char name[MAXPATHLEN]; ! ASSERT(MUTEX_HELD(&spa->spa_man_trim_lock)); ! spa_async_unrequest(spa, SPA_ASYNC_MAN_TRIM_TASKQ_DESTROY); ! if (spa->spa_man_trim_taskq != NULL) ! /* ! * The async taskq destroy has been pre-empted, so just ! * return, the taskq is still good to use. ! */ ! return; ! (void) snprintf(name, sizeof (name), "%s_man_trim", spa->spa_name); ! spa->spa_man_trim_taskq = taskq_create(name, 1, minclsyspri, 1, ! spa->spa_root_vdev->vdev_children, TASKQ_DYNAMIC); ! VERIFY(spa->spa_man_trim_taskq != NULL); } ! /* ! * Destroys the taskq created in spa_auto_trim_taskq_create. The taskq ! * is only destroyed when the autotrim property is set to `off'. ! */ ! void ! spa_auto_trim_taskq_destroy(spa_t *spa) { ! ASSERT(MUTEX_HELD(&spa->spa_auto_trim_lock)); ! ASSERT(spa->spa_auto_trim_taskq != NULL); ! while (spa->spa_num_auto_trimming != 0) ! cv_wait(&spa->spa_auto_trim_done_cv, &spa->spa_auto_trim_lock); ! taskq_destroy(spa->spa_auto_trim_taskq); ! spa->spa_auto_trim_taskq = NULL; } + /* + * Destroys the taskq created in spa_man_trim_taskq_create. The taskq is + * destroyed after a manual trim run completes from an async spa request. + * There is a bit of lag between an async request being issued at the + * completion of a trim run and it finally being acted on, hence why this + * function checks if new manual trimming threads haven't been re-spawned. + * If they have, we assume the async spa request been preempted by another + * manual trim request and we back off. + */ void ! spa_man_trim_taskq_destroy(spa_t *spa) { ! ASSERT(MUTEX_HELD(&spa->spa_man_trim_lock)); ! ASSERT(spa->spa_man_trim_taskq != NULL); ! if (spa->spa_num_man_trimming != 0) ! /* another trim got started before we got here, back off */ ! return; ! taskq_destroy(spa->spa_man_trim_taskq); ! spa->spa_man_trim_taskq = NULL; }