Print this page
NEX-19592 zfs_dbgmsg should not contain info calculated latency
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Evan Layton <evan.layton@nexenta.com>
Reviewed by: Rick McNeal <rick.mcneal@nexenta.com>
NEX-17348 The ZFS deadman timer is currently set too high
Reviewed by: Evan Layton <evan.layton@nexenta.com>
Reviewed by: Rob Gittins <rob.gittins@nexenta.com>
Reviewed by: Joyce McIntosh<joyce.macintosh@nexenta.com>
NEX-9200 Improve the scalability of attribute locking in zfs_zget
Reviewed by: Joyce McIntosh <joyce.mcintosh@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
NEX-13140 DVA-throttle support for special-class
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
NEX-9989 Changing volume names can result in double imports and data corruption
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
NEX-10069 ZFS_READONLY is a little too strict (fix test lint)
NEX-9553 Move ss_fill gap logic from scan algorithm into range_tree.c
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
NEX-6088 ZFS scrub/resilver take excessively long due to issuing lots of random IO
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
NEX-5856 ddt_capped isn't reset when deduped dataset is destroyed
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
NEX-5553 ZFS auto-trim, manual-trim and scrub can race and deadlock
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Rob Gittins <rob.gittins@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
NEX-5795 Rename 'wrc' as 'wbc' in the source and in the tech docs
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-5064 On-demand trim should store operation start and stop time
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-5188 Removed special-vdev causes panic on read or on get size of special-bp
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
NEX-5186 smf-tests contains built files and it shouldn't
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Steve Peng <steve.peng@nexenta.com>
NEX-5168 cleanup and productize non-default latency based writecache load-balancer
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-3729 KRRP changes mess up iostat(1M)
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
NEX-4807 writecache load-balancing statistics: several distinct problems, must be revisited and revised
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-4876 On-demand TRIM shouldn't use system_taskq and should queue jobs
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-4683 WRC: Special block pointer must know that it is special
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
NEX-4677 Fix for NEX-4619 build breakage
NEX-4620 ZFS autotrim triggering is unreliable
NEX-4622 On-demand TRIM code illogically enumerates metaslabs via mg_ms_tree
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Reviewed by: Hans Rosenfeld <hans.rosenfeld@nexenta.com>
NEX-4619 Want kstats to monitor TRIM and UNMAP operation
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Hans Rosenfeld <hans.rosenfeld@nexenta.com>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
4185 add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R (fix studio build)
4185 add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Approved by: Garrett D'Amore <garrett@damore.org>
5818 zfs {ref}compressratio is incorrect with 4k sector size
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Richard Elling <richard.elling@richardelling.com>
Reviewed by: Steven Hartland <killing@multiplay.co.uk>
Reviewed by: Don Brady <dev.fs.zfs@gmail.com>
Approved by: Albert Lee <trisk@omniti.com>
NEX-4476 WRC: Allow to use write back cache per tree of datasets
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
Revert "NEX-4476 WRC: Allow to use write back cache per tree of datasets"
This reverts commit fe97b74444278a6f36fec93179133641296312da.
NEX-4476 WRC: Allow to use write back cache per tree of datasets
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
NEX-4245 WRC: Code cleanup and refactoring to simplify merge with upstream
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
NEX-4203 spa_config_tryenter incorrectly handles the multiple-lock case
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
NEX-3965 System may panic on the importing of pool with WRC
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Revert "NEX-3965 System may panic on the importing of pool with WRC"
This reverts commit 45bc50222913cddafde94621d28b78d6efaea897.
NEX-3984 On-demand TRIM
Reviewed by: Alek Pinchuk <alek@nexenta.com>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Conflicts:
usr/src/common/zfs/zpool_prop.c
usr/src/uts/common/sys/fs/zfs.h
NEX-3965 System may panic on the importing of pool with WRC
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
NEX-3558 KRRP Integration
NEX-3508 CLONE - Port NEX-2946 Add UNMAP/TRIM functionality to ZFS and illumos
Reviewed by: Josef Sipek <josef.sipek@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Conflicts:
usr/src/uts/common/io/scsi/targets/sd.c
usr/src/uts/common/sys/scsi/targets/sddef.h
NEX-3165 need some dedup improvements
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
4391 panic system rather than corrupting pool if we hit bug 4390
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Approved by: Gordon Ross <gwr@nexenta.com>
4370 avoid transmitting holes during zfs send
4371 DMU code clean up
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net>
Approved by: Garrett D'Amore <garrett@damore.org>
OS-114 Heap leak when exporting/destroying pools with CoS
SUP-577 deadlock between zpool detach and syseventd
OS-80 support for vdev and CoS properties for the new I/O scheduler
OS-95 lint warning introduced by OS-61
Fixup merge results
re #13333 rb4362 - eliminated spa_update_iotime() to fix the stats
re #12643 rb4064 ZFS meta refactoring - vdev utilization tracking, auto-dedup
re #12585 rb4049 ZFS++ work port - refactoring to improve separation of open/closed code, bug fixes, performance improvements - open code
re #8346 rb2639 KT disk failures
Bug 11205: add missing libzfs_closed_stubs.c to fix opensource-only build.
ZFS plus work: special vdevs, cos, cos/vdev properties
*** 19,30 ****
* CDDL HEADER END
*/
/*
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
* Copyright (c) 2011, 2017 by Delphix. All rights reserved.
- * Copyright 2015 Nexenta Systems, Inc. All rights reserved.
* Copyright (c) 2014 Spectra Logic Corporation, All rights reserved.
* Copyright 2013 Saso Kiselkov. All rights reserved.
* Copyright (c) 2014 Integros [integros.com]
* Copyright (c) 2017 Datto Inc.
*/
--- 19,30 ----
* CDDL HEADER END
*/
/*
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
* Copyright (c) 2011, 2017 by Delphix. All rights reserved.
* Copyright (c) 2014 Spectra Logic Corporation, All rights reserved.
+ * Copyright 2019 Nexenta Systems, Inc. All rights reserved.
* Copyright 2013 Saso Kiselkov. All rights reserved.
* Copyright (c) 2014 Integros [integros.com]
* Copyright (c) 2017 Datto Inc.
*/
*** 50,59 ****
--- 50,60 ----
#include <sys/dsl_scan.h>
#include <sys/fs/zfs.h>
#include <sys/metaslab_impl.h>
#include <sys/arc.h>
#include <sys/ddt.h>
+ #include <sys/cos.h>
#include "zfs_prop.h"
#include <sys/zfeature.h>
/*
* SPA locking
*** 224,233 ****
--- 225,242 ----
*
* spa_rename() is also implemented within this file since it requires
* manipulation of the namespace.
*/
+ struct spa_trimstats {
+ kstat_named_t st_extents; /* # of extents issued to zio */
+ kstat_named_t st_bytes; /* # of bytes issued to zio */
+ kstat_named_t st_extents_skipped; /* # of extents too small */
+ kstat_named_t st_bytes_skipped; /* bytes in extents_skipped */
+ kstat_named_t st_auto_slow; /* trim slow, exts dropped */
+ };
+
static avl_tree_t spa_namespace_avl;
kmutex_t spa_namespace_lock;
static kcondvar_t spa_namespace_cv;
static int spa_active_count;
int spa_max_replication_override = SPA_DVAS_PER_BP;
*** 239,257 ****
kmem_cache_t *spa_buffer_pool;
int spa_mode_global;
#ifdef ZFS_DEBUG
! /*
! * Everything except dprintf, spa, and indirect_remap is on by default
! * in debug builds.
! */
! int zfs_flags = ~(ZFS_DEBUG_DPRINTF | ZFS_DEBUG_SPA | ZFS_DEBUG_INDIRECT_REMAP);
#else
int zfs_flags = 0;
#endif
/*
* zfs_recover can be set to nonzero to attempt to recover from
* otherwise-fatal errors, typically caused by on-disk corruption. When
* set, calls to zfs_panic_recover() will turn into warning messages.
* This should only be used as a last resort, as it typically results
--- 248,266 ----
kmem_cache_t *spa_buffer_pool;
int spa_mode_global;
#ifdef ZFS_DEBUG
! /* Everything except dprintf and spa is on by default in debug builds */
! int zfs_flags = ~(ZFS_DEBUG_DPRINTF | ZFS_DEBUG_SPA);
#else
int zfs_flags = 0;
#endif
+ #define ZFS_OBJ_MTX_DEFAULT_SZ 64
+ uint64_t spa_obj_mtx_sz = ZFS_OBJ_MTX_DEFAULT_SZ;
+
/*
* zfs_recover can be set to nonzero to attempt to recover from
* otherwise-fatal errors, typically caused by on-disk corruption. When
* set, calls to zfs_panic_recover() will turn into warning messages.
* This should only be used as a last resort, as it typically results
*** 289,306 ****
* leaking space in the "partial temporary" failure case.
*/
boolean_t zfs_free_leak_on_eio = B_FALSE;
/*
* Expiration time in milliseconds. This value has two meanings. First it is
* used to determine when the spa_deadman() logic should fire. By default the
! * spa_deadman() will fire if spa_sync() has not completed in 1000 seconds.
* Secondly, the value determines if an I/O is considered "hung". Any I/O that
* has not completed in zfs_deadman_synctime_ms is considered "hung" resulting
* in a system panic.
*/
! uint64_t zfs_deadman_synctime_ms = 1000000ULL;
/*
* Check time in milliseconds. This defines the frequency at which we check
* for hung I/O.
*/
--- 298,321 ----
* leaking space in the "partial temporary" failure case.
*/
boolean_t zfs_free_leak_on_eio = B_FALSE;
/*
+ * alpha for spa_update_latency() rolling average of pool latency, which
+ * is updated on every txg commit.
+ */
+ int64_t zfs_root_latency_alpha = 10;
+
+ /*
* Expiration time in milliseconds. This value has two meanings. First it is
* used to determine when the spa_deadman() logic should fire. By default the
! * spa_deadman() will fire if spa_sync() has not completed in 250 seconds.
* Secondly, the value determines if an I/O is considered "hung". Any I/O that
* has not completed in zfs_deadman_synctime_ms is considered "hung" resulting
* in a system panic.
*/
! uint64_t zfs_deadman_synctime_ms = 250000ULL;
/*
* Check time in milliseconds. This defines the frequency at which we check
* for hung I/O.
*/
*** 352,391 ****
* See also the comments in zfs_space_check_t.
*/
int spa_slop_shift = 5;
uint64_t spa_min_slop = 128 * 1024 * 1024;
! /*PRINTFLIKE2*/
! void
! spa_load_failed(spa_t *spa, const char *fmt, ...)
! {
! va_list adx;
! char buf[256];
- va_start(adx, fmt);
- (void) vsnprintf(buf, sizeof (buf), fmt, adx);
- va_end(adx);
-
- zfs_dbgmsg("spa_load(%s, config %s): FAILED: %s", spa->spa_name,
- spa->spa_trust_config ? "trusted" : "untrusted", buf);
- }
-
- /*PRINTFLIKE2*/
- void
- spa_load_note(spa_t *spa, const char *fmt, ...)
- {
- va_list adx;
- char buf[256];
-
- va_start(adx, fmt);
- (void) vsnprintf(buf, sizeof (buf), fmt, adx);
- va_end(adx);
-
- zfs_dbgmsg("spa_load(%s, config %s): %s", spa->spa_name,
- spa->spa_trust_config ? "trusted" : "untrusted", buf);
- }
-
/*
* ==========================================================================
* SPA config locking
* ==========================================================================
*/
--- 367,379 ----
* See also the comments in zfs_space_check_t.
*/
int spa_slop_shift = 5;
uint64_t spa_min_slop = 128 * 1024 * 1024;
! static void spa_trimstats_create(spa_t *spa);
! static void spa_trimstats_destroy(spa_t *spa);
/*
* ==========================================================================
* SPA config locking
* ==========================================================================
*/
*** 474,484 ****
scl->scl_writer = curthread;
}
(void) refcount_add(&scl->scl_count, tag);
mutex_exit(&scl->scl_lock);
}
! ASSERT3U(wlocks_held, <=, locks);
}
void
spa_config_exit(spa_t *spa, int locks, void *tag)
{
--- 462,472 ----
scl->scl_writer = curthread;
}
(void) refcount_add(&scl->scl_count, tag);
mutex_exit(&scl->scl_lock);
}
! ASSERT(wlocks_held <= locks);
}
void
spa_config_exit(spa_t *spa, int locks, void *tag)
{
*** 585,594 ****
--- 573,583 ----
{
spa_t *spa;
spa_config_dirent_t *dp;
cyc_handler_t hdlr;
cyc_time_t when;
+ uint64_t guid;
ASSERT(MUTEX_HELD(&spa_namespace_lock));
spa = kmem_zalloc(sizeof (spa_t), KM_SLEEP);
*** 602,618 ****
mutex_init(&spa->spa_cksum_tmpls_lock, NULL, MUTEX_DEFAULT, NULL);
mutex_init(&spa->spa_scrub_lock, NULL, MUTEX_DEFAULT, NULL);
mutex_init(&spa->spa_suspend_lock, NULL, MUTEX_DEFAULT, NULL);
mutex_init(&spa->spa_vdev_top_lock, NULL, MUTEX_DEFAULT, NULL);
mutex_init(&spa->spa_iokstat_lock, NULL, MUTEX_DEFAULT, NULL);
! mutex_init(&spa->spa_alloc_lock, NULL, MUTEX_DEFAULT, NULL);
cv_init(&spa->spa_async_cv, NULL, CV_DEFAULT, NULL);
cv_init(&spa->spa_evicting_os_cv, NULL, CV_DEFAULT, NULL);
cv_init(&spa->spa_proc_cv, NULL, CV_DEFAULT, NULL);
cv_init(&spa->spa_scrub_io_cv, NULL, CV_DEFAULT, NULL);
cv_init(&spa->spa_suspend_cv, NULL, CV_DEFAULT, NULL);
for (int t = 0; t < TXG_SIZE; t++)
bplist_create(&spa->spa_free_bplist[t]);
(void) strlcpy(spa->spa_name, name, sizeof (spa->spa_name));
--- 591,615 ----
mutex_init(&spa->spa_cksum_tmpls_lock, NULL, MUTEX_DEFAULT, NULL);
mutex_init(&spa->spa_scrub_lock, NULL, MUTEX_DEFAULT, NULL);
mutex_init(&spa->spa_suspend_lock, NULL, MUTEX_DEFAULT, NULL);
mutex_init(&spa->spa_vdev_top_lock, NULL, MUTEX_DEFAULT, NULL);
mutex_init(&spa->spa_iokstat_lock, NULL, MUTEX_DEFAULT, NULL);
! mutex_init(&spa->spa_cos_props_lock, NULL, MUTEX_DEFAULT, NULL);
! mutex_init(&spa->spa_vdev_props_lock, NULL, MUTEX_DEFAULT, NULL);
! mutex_init(&spa->spa_perfmon.perfmon_lock, NULL, MUTEX_DEFAULT, NULL);
+ mutex_init(&spa->spa_auto_trim_lock, NULL, MUTEX_DEFAULT, NULL);
+ mutex_init(&spa->spa_man_trim_lock, NULL, MUTEX_DEFAULT, NULL);
+
cv_init(&spa->spa_async_cv, NULL, CV_DEFAULT, NULL);
cv_init(&spa->spa_evicting_os_cv, NULL, CV_DEFAULT, NULL);
cv_init(&spa->spa_proc_cv, NULL, CV_DEFAULT, NULL);
cv_init(&spa->spa_scrub_io_cv, NULL, CV_DEFAULT, NULL);
cv_init(&spa->spa_suspend_cv, NULL, CV_DEFAULT, NULL);
+ cv_init(&spa->spa_auto_trim_done_cv, NULL, CV_DEFAULT, NULL);
+ cv_init(&spa->spa_man_trim_update_cv, NULL, CV_DEFAULT, NULL);
+ cv_init(&spa->spa_man_trim_done_cv, NULL, CV_DEFAULT, NULL);
for (int t = 0; t < TXG_SIZE; t++)
bplist_create(&spa->spa_free_bplist[t]);
(void) strlcpy(spa->spa_name, name, sizeof (spa->spa_name));
*** 620,631 ****
spa->spa_freeze_txg = UINT64_MAX;
spa->spa_final_txg = UINT64_MAX;
spa->spa_load_max_txg = UINT64_MAX;
spa->spa_proc = &p0;
spa->spa_proc_state = SPA_PROC_NONE;
! spa->spa_trust_config = B_TRUE;
hdlr.cyh_func = spa_deadman;
hdlr.cyh_arg = spa;
hdlr.cyh_level = CY_LOW_LEVEL;
spa->spa_deadman_synctime = MSEC2NSEC(zfs_deadman_synctime_ms);
--- 617,643 ----
spa->spa_freeze_txg = UINT64_MAX;
spa->spa_final_txg = UINT64_MAX;
spa->spa_load_max_txg = UINT64_MAX;
spa->spa_proc = &p0;
spa->spa_proc_state = SPA_PROC_NONE;
! if (spa_obj_mtx_sz < 1 || spa_obj_mtx_sz > INT_MAX)
! spa->spa_obj_mtx_sz = ZFS_OBJ_MTX_DEFAULT_SZ;
! else
! spa->spa_obj_mtx_sz = spa_obj_mtx_sz;
+ /*
+ * Grabbing the guid here is just so that spa_config_guid_exists can
+ * check early on to protect against doubled imports of the same pool
+ * under different names. If the GUID isn't provided here, we will
+ * let spa generate one later on during spa_load, although in that
+ * case we might not be able to provide the double-import protection.
+ */
+ if (nvlist_lookup_uint64(config, ZPOOL_CONFIG_POOL_GUID, &guid) == 0) {
+ spa->spa_config_guid = guid;
+ ASSERT(!spa_config_guid_exists(guid));
+ }
+
hdlr.cyh_func = spa_deadman;
hdlr.cyh_arg = spa;
hdlr.cyh_level = CY_LOW_LEVEL;
spa->spa_deadman_synctime = MSEC2NSEC(zfs_deadman_synctime_ms);
*** 653,665 ****
if (altroot) {
spa->spa_root = spa_strdup(altroot);
spa_active_count++;
}
- avl_create(&spa->spa_alloc_tree, zio_bookmark_compare,
- sizeof (zio_t), offsetof(zio_t, io_alloc_node));
-
/*
* Every pool starts with the default cachefile
*/
list_create(&spa->spa_config_list, sizeof (spa_config_dirent_t),
offsetof(spa_config_dirent_t, scd_link));
--- 665,674 ----
*** 687,706 ****
VERIFY(nvlist_alloc(&spa->spa_label_features, NV_UNIQUE_NAME,
KM_SLEEP) == 0);
}
spa->spa_iokstat = kstat_create("zfs", 0, name,
! "disk", KSTAT_TYPE_IO, 1, 0);
if (spa->spa_iokstat) {
spa->spa_iokstat->ks_lock = &spa->spa_iokstat_lock;
kstat_install(spa->spa_iokstat);
}
spa->spa_debug = ((zfs_flags & ZFS_DEBUG_SPA) != 0);
spa->spa_min_ashift = INT_MAX;
spa->spa_max_ashift = 0;
/*
* As a pool is being created, treat all features as disabled by
* setting SPA_FEATURE_DISABLED for all entries in the feature
* refcount cache.
--- 696,724 ----
VERIFY(nvlist_alloc(&spa->spa_label_features, NV_UNIQUE_NAME,
KM_SLEEP) == 0);
}
spa->spa_iokstat = kstat_create("zfs", 0, name,
! "zfs", KSTAT_TYPE_IO, 1, 0);
if (spa->spa_iokstat) {
spa->spa_iokstat->ks_lock = &spa->spa_iokstat_lock;
kstat_install(spa->spa_iokstat);
}
+ spa_trimstats_create(spa);
+
spa->spa_debug = ((zfs_flags & ZFS_DEBUG_SPA) != 0);
+ autosnap_init(spa);
+
+ spa_cos_init(spa);
+
+ spa_special_init(spa);
+
spa->spa_min_ashift = INT_MAX;
spa->spa_max_ashift = 0;
+ wbc_init(&spa->spa_wbc, spa);
/*
* As a pool is being created, treat all features as disabled by
* setting SPA_FEATURE_DISABLED for all entries in the feature
* refcount cache.
*** 741,753 ****
if (dp->scd_path != NULL)
spa_strfree(dp->scd_path);
kmem_free(dp, sizeof (spa_config_dirent_t));
}
- avl_destroy(&spa->spa_alloc_tree);
list_destroy(&spa->spa_config_list);
nvlist_free(spa->spa_label_features);
nvlist_free(spa->spa_load_info);
spa_config_set(spa, NULL);
mutex_enter(&cpu_lock);
--- 759,778 ----
if (dp->scd_path != NULL)
spa_strfree(dp->scd_path);
kmem_free(dp, sizeof (spa_config_dirent_t));
}
list_destroy(&spa->spa_config_list);
+ wbc_fini(&spa->spa_wbc);
+
+ spa_special_fini(spa);
+
+ spa_cos_fini(spa);
+
+ autosnap_fini(spa);
+
nvlist_free(spa->spa_label_features);
nvlist_free(spa->spa_load_info);
spa_config_set(spa, NULL);
mutex_enter(&cpu_lock);
*** 758,767 ****
--- 783,794 ----
refcount_destroy(&spa->spa_refcount);
spa_config_lock_destroy(spa);
+ spa_trimstats_destroy(spa);
+
kstat_delete(spa->spa_iokstat);
spa->spa_iokstat = NULL;
for (int t = 0; t < TXG_SIZE; t++)
bplist_destroy(&spa->spa_free_bplist[t]);
*** 771,782 ****
cv_destroy(&spa->spa_async_cv);
cv_destroy(&spa->spa_evicting_os_cv);
cv_destroy(&spa->spa_proc_cv);
cv_destroy(&spa->spa_scrub_io_cv);
cv_destroy(&spa->spa_suspend_cv);
- mutex_destroy(&spa->spa_alloc_lock);
mutex_destroy(&spa->spa_async_lock);
mutex_destroy(&spa->spa_errlist_lock);
mutex_destroy(&spa->spa_errlog_lock);
mutex_destroy(&spa->spa_evicting_os_lock);
mutex_destroy(&spa->spa_history_lock);
--- 798,811 ----
cv_destroy(&spa->spa_async_cv);
cv_destroy(&spa->spa_evicting_os_cv);
cv_destroy(&spa->spa_proc_cv);
cv_destroy(&spa->spa_scrub_io_cv);
cv_destroy(&spa->spa_suspend_cv);
+ cv_destroy(&spa->spa_auto_trim_done_cv);
+ cv_destroy(&spa->spa_man_trim_update_cv);
+ cv_destroy(&spa->spa_man_trim_done_cv);
mutex_destroy(&spa->spa_async_lock);
mutex_destroy(&spa->spa_errlist_lock);
mutex_destroy(&spa->spa_errlog_lock);
mutex_destroy(&spa->spa_evicting_os_lock);
mutex_destroy(&spa->spa_history_lock);
*** 785,794 ****
--- 814,827 ----
mutex_destroy(&spa->spa_cksum_tmpls_lock);
mutex_destroy(&spa->spa_scrub_lock);
mutex_destroy(&spa->spa_suspend_lock);
mutex_destroy(&spa->spa_vdev_top_lock);
mutex_destroy(&spa->spa_iokstat_lock);
+ mutex_destroy(&spa->spa_cos_props_lock);
+ mutex_destroy(&spa->spa_vdev_props_lock);
+ mutex_destroy(&spa->spa_auto_trim_lock);
+ mutex_destroy(&spa->spa_man_trim_lock);
kmem_free(spa, sizeof (spa_t));
}
/*
*** 1108,1117 ****
--- 1141,1153 ----
uint64_t
spa_vdev_enter(spa_t *spa)
{
mutex_enter(&spa->spa_vdev_top_lock);
mutex_enter(&spa_namespace_lock);
+ mutex_enter(&spa->spa_auto_trim_lock);
+ mutex_enter(&spa->spa_man_trim_lock);
+ spa_trim_stop_wait(spa);
return (spa_vdev_config_enter(spa));
}
/*
* Internal implementation for spa_vdev_enter(). Used when a vdev
*** 1156,1165 ****
--- 1192,1202 ----
/*
* Verify the metaslab classes.
*/
ASSERT(metaslab_class_validate(spa_normal_class(spa)) == 0);
ASSERT(metaslab_class_validate(spa_log_class(spa)) == 0);
+ ASSERT(metaslab_class_validate(spa_special_class(spa)) == 0);
spa_config_exit(spa, SCL_ALL, spa);
/*
* Panic the system if the specified tag requires it. This
*** 1186,1196 ****
/*
* If the config changed, update the config cache.
*/
if (config_changed)
! spa_write_cachefile(spa, B_FALSE, B_TRUE);
}
/*
* Unlock the spa_t after adding or removing a vdev. Besides undoing the
* locking of spa_vdev_enter(), we also want make sure the transactions have
--- 1223,1233 ----
/*
* If the config changed, update the config cache.
*/
if (config_changed)
! spa_config_sync(spa, B_FALSE, B_TRUE);
}
/*
* Unlock the spa_t after adding or removing a vdev. Besides undoing the
* locking of spa_vdev_enter(), we also want make sure the transactions have
*** 1199,1208 ****
--- 1236,1247 ----
*/
int
spa_vdev_exit(spa_t *spa, vdev_t *vd, uint64_t txg, int error)
{
spa_vdev_config_exit(spa, vd, txg, error, FTAG);
+ mutex_exit(&spa->spa_man_trim_lock);
+ mutex_exit(&spa->spa_auto_trim_lock);
mutex_exit(&spa_namespace_lock);
mutex_exit(&spa->spa_vdev_top_lock);
return (error);
}
*** 1270,1280 ****
/*
* If the config changed, update the config cache.
*/
if (config_changed) {
mutex_enter(&spa_namespace_lock);
! spa_write_cachefile(spa, B_FALSE, B_TRUE);
mutex_exit(&spa_namespace_lock);
}
return (error);
}
--- 1309,1319 ----
/*
* If the config changed, update the config cache.
*/
if (config_changed) {
mutex_enter(&spa_namespace_lock);
! spa_config_sync(spa, B_FALSE, B_TRUE);
mutex_exit(&spa_namespace_lock);
}
return (error);
}
*** 1348,1358 ****
txg_wait_synced(spa->spa_dsl_pool, 0);
/*
* Sync the updated config cache.
*/
! spa_write_cachefile(spa, B_FALSE, B_TRUE);
spa_close(spa, FTAG);
mutex_exit(&spa_namespace_lock);
--- 1387,1397 ----
txg_wait_synced(spa->spa_dsl_pool, 0);
/*
* Sync the updated config cache.
*/
! spa_config_sync(spa, B_FALSE, B_TRUE);
spa_close(spa, FTAG);
mutex_exit(&spa_namespace_lock);
*** 1406,1415 ****
--- 1445,1483 ----
spa_guid_exists(uint64_t pool_guid, uint64_t device_guid)
{
return (spa_by_guid(pool_guid, device_guid) != NULL);
}
+ /*
+ * Similar to spa_guid_exists, but uses the spa_config_guid and doesn't
+ * filter the check by pool state (as spa_guid_exists does). This is
+ * used to protect against attempting to spa_add the same pool (with the
+ * same pool GUID) under different names. This situation can happen if
+ * the boot_archive contains an outdated zpool.cache file after a pool
+ * rename. That would make us import the pool twice, resulting in data
+ * corruption. Normally the boot_archive shouldn't contain a zpool.cache
+ * file, but if due to misconfiguration it does, this function serves as
+ * a failsafe to prevent the double import.
+ */
+ boolean_t
+ spa_config_guid_exists(uint64_t pool_guid)
+ {
+ spa_t *spa;
+
+ ASSERT(MUTEX_HELD(&spa_namespace_lock));
+ if (pool_guid == 0)
+ return (B_FALSE);
+
+ for (spa = avl_first(&spa_namespace_avl); spa != NULL;
+ spa = AVL_NEXT(&spa_namespace_avl, spa)) {
+ if (spa->spa_config_guid == pool_guid)
+ return (B_TRUE);
+ }
+
+ return (B_FALSE);
+ }
+
char *
spa_strdup(const char *s)
{
size_t len;
char *new;
*** 1564,1579 ****
spa_is_initializing(spa_t *spa)
{
return (spa->spa_is_initializing);
}
- boolean_t
- spa_indirect_vdevs_loaded(spa_t *spa)
- {
- return (spa->spa_indirect_vdevs_loaded);
- }
-
blkptr_t *
spa_get_rootblkptr(spa_t *spa)
{
return (&spa->spa_ubsync.ub_rootbp);
}
--- 1632,1641 ----
*** 1696,1705 ****
--- 1758,1811 ----
{
return (lsize * spa_asize_inflation);
}
/*
+ * Get either on disk (phys == B_TRUE) or possible in core DDT size
+ */
+ uint64_t
+ spa_get_ddts_size(spa_t *spa, boolean_t phys)
+ {
+ if (phys)
+ return (spa->spa_ddt_dsize);
+
+ return (spa->spa_ddt_msize);
+ }
+
+ /*
+ * Check to see if we need to stop DDT growth to stay within some limit
+ */
+ boolean_t
+ spa_enable_dedup_cap(spa_t *spa)
+ {
+ if (zfs_ddt_byte_ceiling != 0) {
+ if (zfs_ddts_msize > zfs_ddt_byte_ceiling) {
+ /* need to limit DDT to an in core bytecount */
+ return (B_TRUE);
+ }
+ } else if (zfs_ddt_limit_type == DDT_LIMIT_TO_ARC) {
+ if (zfs_ddts_msize > *arc_ddt_evict_threshold) {
+ /* need to limit DDT to fit into ARC */
+ return (B_TRUE);
+ }
+ } else if (zfs_ddt_limit_type == DDT_LIMIT_TO_L2ARC) {
+ if (spa->spa_l2arc_ddt_devs_size != 0) {
+ if (spa_get_ddts_size(spa, B_TRUE) >
+ spa->spa_l2arc_ddt_devs_size) {
+ /* limit DDT to fit into L2ARC DDT dev */
+ return (B_TRUE);
+ }
+ } else if (zfs_ddts_msize > *arc_ddt_evict_threshold) {
+ /* no L2ARC DDT dev - limit DDT to fit into ARC */
+ return (B_TRUE);
+ }
+ }
+
+ return (B_FALSE);
+ }
+
+ /*
* Return the amount of slop space in bytes. It is 1/32 of the pool (3.2%),
* or at least 128MB, unless that would cause it to be more than half the
* pool size.
*
* See the comment above spa_slop_shift for details.
*** 1720,1749 ****
void
spa_update_dspace(spa_t *spa)
{
spa->spa_dspace = metaslab_class_get_dspace(spa_normal_class(spa)) +
ddt_get_dedup_dspace(spa);
! if (spa->spa_vdev_removal != NULL) {
/*
! * We can't allocate from the removing device, so
! * subtract its size. This prevents the DMU/DSL from
! * filling up the (now smaller) pool while we are in the
! * middle of removing the device.
! *
! * Note that the DMU/DSL doesn't actually know or care
! * how much space is allocated (it does its own tracking
! * of how much space has been logically used). So it
! * doesn't matter that the data we are moving may be
! * allocated twice (on the old device and the new
! * device).
*/
! vdev_t *vd = spa->spa_vdev_removal->svr_vdev;
! spa->spa_dspace -= spa_deflate(spa) ?
! vd->vdev_stat.vs_dspace : vd->vdev_stat.vs_space;
}
}
/*
* Return the failure mode that has been set to this pool. The default
* behavior will be to block all I/Os when a complete failure occurs.
*/
uint8_t
--- 1826,1880 ----
void
spa_update_dspace(spa_t *spa)
{
spa->spa_dspace = metaslab_class_get_dspace(spa_normal_class(spa)) +
ddt_get_dedup_dspace(spa);
! }
!
! /*
! * EXPERIMENTAL
! * Use exponential moving average to track root vdev iotime, as well as top
! * level vdev iotime.
! * The principle: avg_new = avg_prev + (cur - avg_prev) * a / 100; a is
! * tuneable. For example, if a = 10 (alpha = 0.1), it will take 20 iterations,
! * or 100 seconds at 5 second txg commit intervals for the values from last 20
! * iterations to account for 66% of the moving average.
! * Currently, the challenge is that we keep track of iotime in cumulative
! * nanoseconds since zpool import, both for leaf and top vdevs, so a way of
! * getting delta pre/post txg commit is required.
! */
!
! void
! spa_update_latency(spa_t *spa)
! {
! vdev_t *rvd = spa->spa_root_vdev;
! vdev_stat_t *rvs = &rvd->vdev_stat;
! for (int c = 0; c < rvd->vdev_children; c++) {
! vdev_t *cvd = rvd->vdev_child[c];
! vdev_stat_t *cvs = &cvd->vdev_stat;
! mutex_enter(&rvd->vdev_stat_lock);
!
! for (int t = 0; t < ZIO_TYPES; t++) {
!
/*
! * Non-trivial bit here. We update the moving latency
! * average for each child vdev separately, but since we
! * want the average to settle at the same rate
! * regardless of top level vdev count, we effectively
! * divide our alpha by number of children of the root
! * vdev to account for that.
*/
! rvs->vs_latency[t] += ((((int64_t)cvs->vs_latency[t] -
! (int64_t)rvs->vs_latency[t]) *
! (int64_t)zfs_root_latency_alpha) / 100) /
! (int64_t)(rvd->vdev_children);
}
+ mutex_exit(&rvd->vdev_stat_lock);
+ }
}
+
/*
* Return the failure mode that has been set to this pool. The default
* behavior will be to block all I/Os when a complete failure occurs.
*/
uint8_t
*** 1762,1771 ****
--- 1893,1908 ----
spa_version(spa_t *spa)
{
return (spa->spa_ubsync.ub_version);
}
+ int
+ spa_get_obj_mtx_sz(spa_t *spa)
+ {
+ return (spa->spa_obj_mtx_sz);
+ }
+
boolean_t
spa_deflate(spa_t *spa)
{
return (spa->spa_deflate);
}
*** 1780,1789 ****
--- 1917,1932 ----
spa_log_class(spa_t *spa)
{
return (spa->spa_log_class);
}
+ metaslab_class_t *
+ spa_special_class(spa_t *spa)
+ {
+ return (spa->spa_special_class);
+ }
+
void
spa_evicting_os_register(spa_t *spa, objset_t *os)
{
mutex_enter(&spa->spa_evicting_os_lock);
list_insert_head(&spa->spa_evicting_os_list, os);
*** 1808,1817 ****
--- 1951,1970 ----
mutex_exit(&spa->spa_evicting_os_lock);
dmu_buf_user_evict_wait();
}
+ uint64_t
+ spa_class_alloc_percentage(metaslab_class_t *mc)
+ {
+ uint64_t capacity = mc->mc_space;
+ uint64_t alloc = mc->mc_alloc;
+ uint64_t one_percent = capacity / 100;
+
+ return (alloc / one_percent);
+ }
+
int
spa_max_replication(spa_t *spa)
{
/*
* As of SPA_VERSION == SPA_VERSION_DITTO_BLOCKS, we are able to
*** 1833,1842 ****
--- 1986,2007 ----
spa_deadman_synctime(spa_t *spa)
{
return (spa->spa_deadman_synctime);
}
+ spa_force_trim_t
+ spa_get_force_trim(spa_t *spa)
+ {
+ return (spa->spa_force_trim);
+ }
+
+ spa_auto_trim_t
+ spa_get_auto_trim(spa_t *spa)
+ {
+ return (spa->spa_auto_trim);
+ }
+
uint64_t
dva_get_dsize_sync(spa_t *spa, const dva_t *dva)
{
uint64_t asize = DVA_GET_ASIZE(dva);
uint64_t dsize = asize;
*** 1849,1878 ****
}
return (dsize);
}
uint64_t
bp_get_dsize_sync(spa_t *spa, const blkptr_t *bp)
{
uint64_t dsize = 0;
! for (int d = 0; d < BP_GET_NDVAS(bp); d++)
dsize += dva_get_dsize_sync(spa, &bp->blk_dva[d]);
return (dsize);
}
uint64_t
bp_get_dsize(spa_t *spa, const blkptr_t *bp)
{
! uint64_t dsize = 0;
spa_config_enter(spa, SCL_VDEV, FTAG, RW_READER);
! for (int d = 0; d < BP_GET_NDVAS(bp); d++)
! dsize += dva_get_dsize_sync(spa, &bp->blk_dva[d]);
spa_config_exit(spa, SCL_VDEV, FTAG);
return (dsize);
}
--- 2014,2054 ----
}
return (dsize);
}
+ /*
+ * This function walks over the all DVAs of the given BP and
+ * adds up their sizes.
+ */
uint64_t
bp_get_dsize_sync(spa_t *spa, const blkptr_t *bp)
{
+ /*
+ * SPECIAL-BP has two DVAs, but DVA[0] in this case is a
+ * temporary DVA, and after migration only the DVA[1]
+ * contains valid data. Therefore, we start walking for
+ * these BPs from DVA[1].
+ */
+ int start_dva = BP_IS_SPECIAL(bp) ? 1 : 0;
uint64_t dsize = 0;
! for (int d = start_dva; d < BP_GET_NDVAS(bp); d++) {
dsize += dva_get_dsize_sync(spa, &bp->blk_dva[d]);
+ }
return (dsize);
}
uint64_t
bp_get_dsize(spa_t *spa, const blkptr_t *bp)
{
! uint64_t dsize;
spa_config_enter(spa, SCL_VDEV, FTAG, RW_READER);
! dsize = bp_get_dsize_sync(spa, bp);
spa_config_exit(spa, SCL_VDEV, FTAG);
return (dsize);
}
*** 1927,1936 ****
--- 2103,2120 ----
avl_create(&spa_l2cache_avl, spa_l2cache_compare, sizeof (spa_aux_t),
offsetof(spa_aux_t, aux_avl));
spa_mode_global = mode;
+ /*
+ * logevent_max_q_sz from log_sysevent.c gives us upper bound on
+ * the number of taskq entries; queueing of sysevents is serialized,
+ * so there is no need for more than one worker thread
+ */
+ spa_sysevent_taskq = taskq_create("spa_sysevent_tq", 1,
+ minclsyspri, 1, 5000, TASKQ_DYNAMIC);
+
#ifdef _KERNEL
spa_arch_init();
#else
if (spa_mode_global != FREAD && dprintf_find_string("watch")) {
arc_procfd = open("/proc/self/ctl", O_WRONLY);
*** 1952,1968 ****
--- 2136,2158 ----
zil_init();
vdev_cache_stat_init();
zfs_prop_init();
zpool_prop_init();
zpool_feature_init();
+ vdev_prop_init();
+ cos_prop_init();
spa_config_load();
l2arc_start();
+ ddt_init();
+ dsl_scan_global_init();
}
void
spa_fini(void)
{
+ ddt_fini();
+
l2arc_stop();
spa_evict_all();
vdev_cache_stat_fini();
*** 1972,1981 ****
--- 2162,2173 ----
metaslab_alloc_trace_fini();
range_tree_fini();
unique_fini();
refcount_fini();
+ taskq_destroy(spa_sysevent_taskq);
+
avl_destroy(&spa_namespace_avl);
avl_destroy(&spa_spare_avl);
avl_destroy(&spa_l2cache_avl);
cv_destroy(&spa_namespace_cv);
*** 2014,2024 ****
}
boolean_t
spa_writeable(spa_t *spa)
{
! return (!!(spa->spa_mode & FWRITE) && spa->spa_trust_config);
}
/*
* Returns true if there is a pending sync task in any of the current
* syncing txg, the current quiescing txg, or the current open txg.
--- 2206,2216 ----
}
boolean_t
spa_writeable(spa_t *spa)
{
! return (!!(spa->spa_mode & FWRITE));
}
/*
* Returns true if there is a pending sync task in any of the current
* syncing txg, the current quiescing txg, or the current open txg.
*** 2027,2036 ****
--- 2219,2234 ----
spa_has_pending_synctask(spa_t *spa)
{
return (!txg_all_lists_empty(&spa->spa_dsl_pool->dp_sync_tasks));
}
+ boolean_t
+ spa_has_special(spa_t *spa)
+ {
+ return (spa->spa_special_class->mc_rotor != NULL);
+ }
+
int
spa_mode(spa_t *spa)
{
return (spa->spa_mode);
}
*** 2071,2080 ****
--- 2269,2279 ----
spa->spa_scan_pass_scrub_pause = spa->spa_scan_pass_start;
else
spa->spa_scan_pass_scrub_pause = 0;
spa->spa_scan_pass_scrub_spent_paused = 0;
spa->spa_scan_pass_exam = 0;
+ spa->spa_scan_pass_work = 0;
vdev_scan_stat_init(spa->spa_root_vdev);
}
/*
* Get scan stats for zpool status reports
*** 2096,2109 ****
--- 2295,2312 ----
ps->pss_examined = scn->scn_phys.scn_examined;
ps->pss_to_process = scn->scn_phys.scn_to_process;
ps->pss_processed = scn->scn_phys.scn_processed;
ps->pss_errors = scn->scn_phys.scn_errors;
ps->pss_state = scn->scn_phys.scn_state;
+ mutex_enter(&scn->scn_status_lock);
+ ps->pss_issued = scn->scn_bytes_issued;
+ mutex_exit(&scn->scn_status_lock);
/* data not stored on disk */
ps->pss_pass_start = spa->spa_scan_pass_start;
ps->pss_pass_exam = spa->spa_scan_pass_exam;
+ ps->pss_pass_work = spa->spa_scan_pass_work;
ps->pss_pass_scrub_pause = spa->spa_scan_pass_scrub_pause;
ps->pss_pass_scrub_spent_paused = spa->spa_scan_pass_scrub_spent_paused;
return (0);
}
*** 2121,2184 ****
return (SPA_MAXBLOCKSIZE);
else
return (SPA_OLD_MAXBLOCKSIZE);
}
/*
! * Returns the txg that the last device removal completed. No indirect mappings
! * have been added since this txg.
*/
! uint64_t
! spa_get_last_removal_txg(spa_t *spa)
{
! uint64_t vdevid;
! uint64_t ret = -1ULL;
! spa_config_enter(spa, SCL_VDEV, FTAG, RW_READER);
! /*
! * sr_prev_indirect_vdev is only modified while holding all the
! * config locks, so it is sufficient to hold SCL_VDEV as reader when
! * examining it.
! */
! vdevid = spa->spa_removing_phys.sr_prev_indirect_vdev;
! while (vdevid != -1ULL) {
! vdev_t *vd = vdev_lookup_top(spa, vdevid);
! vdev_indirect_births_t *vib = vd->vdev_indirect_births;
! ASSERT3P(vd->vdev_ops, ==, &vdev_indirect_ops);
! /*
! * If the removal did not remap any data, we don't care.
*/
! if (vdev_indirect_births_count(vib) != 0) {
! ret = vdev_indirect_births_last_entry_txg(vib);
! break;
}
! vdevid = vd->vdev_indirect_config.vic_prev_indirect_vdev;
}
! spa_config_exit(spa, SCL_VDEV, FTAG);
! IMPLY(ret != -1ULL,
! spa_feature_is_active(spa, SPA_FEATURE_DEVICE_REMOVAL));
! return (ret);
}
! boolean_t
! spa_trust_config(spa_t *spa)
{
! return (spa->spa_trust_config);
}
! uint64_t
! spa_missing_tvds_allowed(spa_t *spa)
{
! return (spa->spa_missing_tvds_allowed);
}
void
! spa_set_missing_tvds(spa_t *spa, uint64_t missing)
{
! spa->spa_missing_tvds = missing;
}
--- 2324,2530 ----
return (SPA_MAXBLOCKSIZE);
else
return (SPA_OLD_MAXBLOCKSIZE);
}
+ boolean_t
+ spa_wbc_present(spa_t *spa)
+ {
+ return (spa->spa_wbc_mode != WBC_MODE_OFF);
+ }
+
+ boolean_t
+ spa_wbc_active(spa_t *spa)
+ {
+ return (spa->spa_wbc_mode == WBC_MODE_ACTIVE);
+ }
+
+ int
+ spa_wbc_mode(const char *name)
+ {
+ int ret = 0;
+ spa_t *spa;
+
+ mutex_enter(&spa_namespace_lock);
+ spa = spa_lookup(name);
+ if (!spa) {
+ mutex_exit(&spa_namespace_lock);
+ return (-1);
+ }
+
+ ret = (int)spa->spa_wbc_mode;
+ mutex_exit(&spa_namespace_lock);
+ return (ret);
+ }
+
+ struct zfs_autosnap *
+ spa_get_autosnap(spa_t *spa)
+ {
+ return (&spa->spa_autosnap);
+ }
+
+ wbc_data_t *
+ spa_get_wbc_data(spa_t *spa)
+ {
+ return (&spa->spa_wbc);
+ }
+
/*
! * Creates the trim kstats structure for a spa.
*/
! static void
! spa_trimstats_create(spa_t *spa)
{
! /* truncate pool name to accomodate "_trimstats" suffix */
! char short_spa_name[KSTAT_STRLEN - 10];
! char name[KSTAT_STRLEN];
! ASSERT3P(spa->spa_trimstats, ==, NULL);
! ASSERT3P(spa->spa_trimstats_ks, ==, NULL);
! (void) snprintf(short_spa_name, sizeof (short_spa_name), "%s",
! spa->spa_name);
! (void) snprintf(name, sizeof (name), "%s_trimstats", short_spa_name);
! spa->spa_trimstats_ks = kstat_create("zfs", 0, name, "misc",
! KSTAT_TYPE_NAMED, sizeof (*spa->spa_trimstats) /
! sizeof (kstat_named_t), 0);
! if (spa->spa_trimstats_ks) {
! spa->spa_trimstats = spa->spa_trimstats_ks->ks_data;
! #ifdef _KERNEL
! kstat_named_init(&spa->spa_trimstats->st_extents,
! "extents", KSTAT_DATA_UINT64);
! kstat_named_init(&spa->spa_trimstats->st_bytes,
! "bytes", KSTAT_DATA_UINT64);
! kstat_named_init(&spa->spa_trimstats->st_extents_skipped,
! "extents_skipped", KSTAT_DATA_UINT64);
! kstat_named_init(&spa->spa_trimstats->st_bytes_skipped,
! "bytes_skipped", KSTAT_DATA_UINT64);
! kstat_named_init(&spa->spa_trimstats->st_auto_slow,
! "auto_slow", KSTAT_DATA_UINT64);
! #endif /* _KERNEL */
!
! kstat_install(spa->spa_trimstats_ks);
! } else {
! cmn_err(CE_NOTE, "!Cannot create trim kstats for pool %s",
! spa->spa_name);
! }
! }
!
! /*
! * Destroys the trim kstats for a spa.
*/
! static void
! spa_trimstats_destroy(spa_t *spa)
! {
! if (spa->spa_trimstats_ks) {
! kstat_delete(spa->spa_trimstats_ks);
! spa->spa_trimstats = NULL;
! spa->spa_trimstats_ks = NULL;
}
+ }
! /*
! * Updates the numerical trim kstats for a spa.
! */
! void
! spa_trimstats_update(spa_t *spa, uint64_t extents, uint64_t bytes,
! uint64_t extents_skipped, uint64_t bytes_skipped)
! {
! spa_trimstats_t *st = spa->spa_trimstats;
! if (st) {
! atomic_add_64(&st->st_extents.value.ui64, extents);
! atomic_add_64(&st->st_bytes.value.ui64, bytes);
! atomic_add_64(&st->st_extents_skipped.value.ui64,
! extents_skipped);
! atomic_add_64(&st->st_bytes_skipped.value.ui64,
! bytes_skipped);
}
! }
! /*
! * Increments the slow-trim kstat for a spa.
! */
! void
! spa_trimstats_auto_slow_incr(spa_t *spa)
! {
! spa_trimstats_t *st = spa->spa_trimstats;
! if (st)
! atomic_inc_64(&st->st_auto_slow.value.ui64);
! }
! /*
! * Creates the taskq used for dispatching auto-trim. This is called only when
! * the property is set to `on' or when the pool is loaded (and the autotrim
! * property is `on').
! */
! void
! spa_auto_trim_taskq_create(spa_t *spa)
! {
! char name[MAXPATHLEN];
! ASSERT(MUTEX_HELD(&spa->spa_auto_trim_lock));
! ASSERT(spa->spa_auto_trim_taskq == NULL);
! (void) snprintf(name, sizeof (name), "%s_auto_trim", spa->spa_name);
! spa->spa_auto_trim_taskq = taskq_create(name, 1, minclsyspri, 1,
! spa->spa_root_vdev->vdev_children, TASKQ_DYNAMIC);
! VERIFY(spa->spa_auto_trim_taskq != NULL);
}
! /*
! * Creates the taskq for dispatching manual trim. This taskq is recreated
! * each time `zpool trim <poolname>' is issued and destroyed after the run
! * completes in an async spa request.
! */
! void
! spa_man_trim_taskq_create(spa_t *spa)
{
! char name[MAXPATHLEN];
! ASSERT(MUTEX_HELD(&spa->spa_man_trim_lock));
! spa_async_unrequest(spa, SPA_ASYNC_MAN_TRIM_TASKQ_DESTROY);
! if (spa->spa_man_trim_taskq != NULL)
! /*
! * The async taskq destroy has been pre-empted, so just
! * return, the taskq is still good to use.
! */
! return;
! (void) snprintf(name, sizeof (name), "%s_man_trim", spa->spa_name);
! spa->spa_man_trim_taskq = taskq_create(name, 1, minclsyspri, 1,
! spa->spa_root_vdev->vdev_children, TASKQ_DYNAMIC);
! VERIFY(spa->spa_man_trim_taskq != NULL);
}
! /*
! * Destroys the taskq created in spa_auto_trim_taskq_create. The taskq
! * is only destroyed when the autotrim property is set to `off'.
! */
! void
! spa_auto_trim_taskq_destroy(spa_t *spa)
{
! ASSERT(MUTEX_HELD(&spa->spa_auto_trim_lock));
! ASSERT(spa->spa_auto_trim_taskq != NULL);
! while (spa->spa_num_auto_trimming != 0)
! cv_wait(&spa->spa_auto_trim_done_cv, &spa->spa_auto_trim_lock);
! taskq_destroy(spa->spa_auto_trim_taskq);
! spa->spa_auto_trim_taskq = NULL;
}
+ /*
+ * Destroys the taskq created in spa_man_trim_taskq_create. The taskq is
+ * destroyed after a manual trim run completes from an async spa request.
+ * There is a bit of lag between an async request being issued at the
+ * completion of a trim run and it finally being acted on, hence why this
+ * function checks if new manual trimming threads haven't been re-spawned.
+ * If they have, we assume the async spa request been preempted by another
+ * manual trim request and we back off.
+ */
void
! spa_man_trim_taskq_destroy(spa_t *spa)
{
! ASSERT(MUTEX_HELD(&spa->spa_man_trim_lock));
! ASSERT(spa->spa_man_trim_taskq != NULL);
! if (spa->spa_num_man_trimming != 0)
! /* another trim got started before we got here, back off */
! return;
! taskq_destroy(spa->spa_man_trim_taskq);
! spa->spa_man_trim_taskq = NULL;
}