Print this page
NEX-19394 backport 9337 zfs get all is slow due to uncached metadata
Reviewed by: Joyce McIntosh <joyce.mcintosh@nexenta.com>
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Thomas Caputi <tcaputi@datto.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Conflicts:
usr/src/uts/common/fs/zfs/dbuf.c
usr/src/uts/common/fs/zfs/dmu.c
usr/src/uts/common/fs/zfs/sys/dmu_objset.h
NEX-3562 filename normalization doesn't work for removes (sync with upstream)
NEX-9752 backport illumos 6950 ARC should cache compressed data
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
6950 ARC should cache compressed data
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Don Brady <don.brady@intel.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
NEX-5795 Rename 'wrc' as 'wbc' in the source and in the tech docs
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-5272 KRRP: replicate snapshot properties
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Alexey Komarov <alexey.komarov@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
NEX-5318 Cleanup specialclass property (obsolete, not used) and fix related meta-to-special case
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-5058 WBC: Race between the purging of window and opening new one
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
NEX-2830 ZFS smart compression
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
NEX-4934 Add capability to remove special vdev
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
6495 Fix mutex leak in dmu_objset_find_dp
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Albert Lee <trisk@omniti.com>
6267 dn_bonus evicted too early
Reviewed by: Richard Yao <ryao@gentoo.org>
Reviewed by: Xin LI <delphij@freebsd.org>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
2605 want to resume interrupted zfs send
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed by: Xin Li <delphij@freebsd.org>
Reviewed by: Arne Jansen <sensille@gmx.net>
Approved by: Dan McDonald <danmcd@omniti.com>
6160 /usr/lib/fs/zfs/bootinstall should use bootadm
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Reviewed by: Adam Števko <adam.stevko@gmail.com>
Reviewed by: Josef Sipek <jeffpc@josefsipek.net>
Approved by: Richard Lowe <richlowe@richlowe.net>
4185 add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R (NULL is not an int)
6171 dsl_prop_unregister() slows down dataset eviction.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
5981 Deadlock in dmu_objset_find_dp
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Robert Mustacchi <rm@joyent.com>
5269 zpool import slow
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Dan McDonald <danmcd@omniti.com>
5610 zfs clone from different source and target pools produces coredump
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
NEX-4476 WRC: Allow to use write back cache per tree of datasets
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
Revert "NEX-4476 WRC: Allow to use write back cache per tree of datasets"
This reverts commit fe97b74444278a6f36fec93179133641296312da.
NEX-4476 WRC: Allow to use write back cache per tree of datasets
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
NEX-4028 use lz4 by default
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-3558 KRRP Integration
SUP-507 Delete or truncate of large files delayed on datasets with small recordsize
Reviewed by: Albert Lee <trisk@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Ilya Usvyatsky <ilya.usvyatsky@nexenta.com>
Reviewed by: Tony Nguyen <tony.nguyen@nexenta.com>
OS-80 support for vdev and CoS properties for the new I/O scheduler
OS-95 lint warning introduced by OS-61
Issues #7: Reconsile L2ARC and "special" use by datasets
Support for secondarycache=data option
Align mutex tables in arc.c and dbuf.c to 64 bytes (cache line), place each kmutex_t on cache line by itself to avoid false sharing
re #12619 rb4429 More dp->dp_config_rwlock holds
@@ -51,18 +51,20 @@
#include <sys/zfs_ioctl.h>
#include <sys/sa.h>
#include <sys/zfs_onexit.h>
#include <sys/dsl_destroy.h>
#include <sys/vdev.h>
-#include <sys/zfeature.h>
+#include <sys/wbc.h>
/*
* Needed to close a window in dnode_move() that allows the objset to be freed
* before it can be safely accessed.
*/
krwlock_t os_lock;
+extern kmem_cache_t *zfs_ds_collector_cache;
+
/*
* Tunable to overwrite the maximum number of threads for the parallization
* of dmu_objset_find_dp, needed to speed up the import of pools with many
* datasets.
* Default is 4 times the number of leaf vdevs.
@@ -76,20 +78,33 @@
*/
int dmu_rescan_dnode_threshold = 131072;
static void dmu_objset_find_dp_cb(void *arg);
+/* ARGSUSED */
+static int
+zfs_ds_collector_constructor(void *ds_el, void *unused, int flags)
+{
+ bzero(ds_el, sizeof (zfs_ds_collector_entry_t));
+ return (0);
+}
+
void
dmu_objset_init(void)
{
+ zfs_ds_collector_cache = kmem_cache_create("zfs_ds_collector_cache",
+ sizeof (zfs_ds_collector_entry_t),
+ 8, zfs_ds_collector_constructor,
+ NULL, NULL, NULL, NULL, 0);
rw_init(&os_lock, NULL, RW_DEFAULT, NULL);
}
void
dmu_objset_fini(void)
{
rw_destroy(&os_lock);
+ kmem_cache_destroy(zfs_ds_collector_cache);
}
spa_t *
dmu_objset_spa(objset_t *os)
{
@@ -177,10 +192,18 @@
os->os_compress = zio_compress_select(os->os_spa, newval,
ZIO_COMPRESS_ON);
}
static void
+smartcomp_changed_cb(void *arg, uint64_t newval)
+{
+ objset_t *os = arg;
+
+ os->os_smartcomp_enabled = newval ? B_TRUE : B_FALSE;
+}
+
+static void
copies_changed_cb(void *arg, uint64_t newval)
{
objset_t *os = arg;
/*
@@ -231,16 +254,24 @@
/*
* Inheritance and range checking should have been done by now.
*/
ASSERT(newval == ZFS_CACHE_ALL || newval == ZFS_CACHE_NONE ||
- newval == ZFS_CACHE_METADATA);
+ newval == ZFS_CACHE_METADATA || newval == ZFS_CACHE_DATA);
os->os_secondary_cache = newval;
}
static void
+zpl_meta_placement_changed_cb(void *arg, uint64_t newval)
+{
+ objset_t *os = arg;
+
+ os->os_zpl_meta_to_special = newval;
+}
+
+static void
sync_changed_cb(void *arg, uint64_t newval)
{
objset_t *os = arg;
/*
@@ -347,21 +378,10 @@
objset_t *os;
int i, err;
ASSERT(ds == NULL || MUTEX_HELD(&ds->ds_opening_lock));
- /*
- * The $ORIGIN dataset (if it exists) doesn't have an associated
- * objset, so there's no reason to open it. The $ORIGIN dataset
- * will not exist on pools older than SPA_VERSION_ORIGIN.
- */
- if (ds != NULL && spa_get_dsl(spa) != NULL &&
- spa_get_dsl(spa)->dp_origin_snap != NULL) {
- ASSERT3P(ds->ds_dir, !=,
- spa_get_dsl(spa)->dp_origin_snap->ds_dir);
- }
-
os = kmem_zalloc(sizeof (objset_t), KM_SLEEP);
os->os_dsl_dataset = ds;
os->os_spa = spa;
os->os_rootbp = bp;
if (!BP_IS_HOLE(os->os_rootbp)) {
@@ -432,10 +452,15 @@
if (err == 0) {
err = dsl_prop_register(ds,
zfs_prop_to_name(ZFS_PROP_SECONDARYCACHE),
secondary_cache_changed_cb, os);
}
+ if (err == 0) {
+ err = dsl_prop_register(ds,
+ zfs_prop_to_name(ZFS_PROP_ZPL_META_TO_METADEV),
+ zpl_meta_placement_changed_cb, os);
+ }
if (!ds->ds_is_snapshot) {
if (err == 0) {
err = dsl_prop_register(ds,
zfs_prop_to_name(ZFS_PROP_CHECKSUM),
checksum_changed_cb, os);
@@ -445,10 +470,15 @@
zfs_prop_to_name(ZFS_PROP_COMPRESSION),
compression_changed_cb, os);
}
if (err == 0) {
err = dsl_prop_register(ds,
+ zfs_prop_to_name(ZFS_PROP_SMARTCOMPRESSION),
+ smartcomp_changed_cb, os);
+ }
+ if (err == 0) {
+ err = dsl_prop_register(ds,
zfs_prop_to_name(ZFS_PROP_COPIES),
copies_changed_cb, os);
}
if (err == 0) {
err = dsl_prop_register(ds,
@@ -474,11 +504,16 @@
if (err == 0) {
err = dsl_prop_register(ds,
zfs_prop_to_name(ZFS_PROP_RECORDSIZE),
recordsize_changed_cb, os);
}
+ if (err == 0) {
+ err = dsl_prop_register(ds,
+ zfs_prop_to_name(ZFS_PROP_WBC_MODE),
+ wbc_mode_changed, os);
}
+ }
if (needlock)
dsl_pool_config_exit(dmu_objset_pool(os), FTAG);
if (err != 0) {
arc_buf_destroy(os->os_phys_buf, &os->os_phys_buf);
kmem_free(os, sizeof (objset_t));
@@ -493,11 +528,20 @@
os->os_dedup_verify = B_FALSE;
os->os_logbias = ZFS_LOGBIAS_LATENCY;
os->os_sync = ZFS_SYNC_STANDARD;
os->os_primary_cache = ZFS_CACHE_ALL;
os->os_secondary_cache = ZFS_CACHE_ALL;
+ os->os_zpl_meta_to_special = 0;
}
+ /*
+ * These properties will be filled in by the logic in zfs_get_zplprop()
+ * when they are queried for the first time.
+ */
+ os->os_version = OBJSET_PROP_UNINITIALIZED;
+ os->os_normalization = OBJSET_PROP_UNINITIALIZED;
+ os->os_utf8only = OBJSET_PROP_UNINITIALIZED;
+ os->os_casesensitivity = OBJSET_PROP_UNINITIALIZED;
if (ds == NULL || !ds->ds_is_snapshot)
os->os_zil_header = os->os_phys->os_zil_header;
os->os_zil = zil_alloc(os, &os->os_zil_header);
@@ -712,11 +756,11 @@
*/
if (dnode_add_ref(dn, FTAG)) {
list_insert_after(&os->os_dnodes, dn, &dn_marker);
mutex_exit(&os->os_lock);
- dnode_evict_dbufs(dn);
+ dnode_evict_dbufs(dn, DBUF_EVICT_ALL);
dnode_rele(dn, FTAG);
mutex_enter(&os->os_lock);
dn = list_next(&os->os_dnodes, &dn_marker);
list_remove(&os->os_dnodes, &dn_marker);
@@ -725,14 +769,14 @@
}
}
mutex_exit(&os->os_lock);
if (DMU_USERUSED_DNODE(os) != NULL) {
- dnode_evict_dbufs(DMU_GROUPUSED_DNODE(os));
- dnode_evict_dbufs(DMU_USERUSED_DNODE(os));
+ dnode_evict_dbufs(DMU_GROUPUSED_DNODE(os), DBUF_EVICT_ALL);
+ dnode_evict_dbufs(DMU_USERUSED_DNODE(os), DBUF_EVICT_ALL);
}
- dnode_evict_dbufs(DMU_META_DNODE(os));
+ dnode_evict_dbufs(DMU_META_DNODE(os), DBUF_EVICT_ALL);
}
/*
* Objset eviction processing is split into into two pieces.
* The first marks the objset as evicting, evicts any dbufs that
@@ -1062,106 +1106,11 @@
return (dsl_sync_task(clone,
dmu_objset_clone_check, dmu_objset_clone_sync, &doca,
5, ZFS_SPACE_CHECK_NORMAL));
}
-static int
-dmu_objset_remap_indirects_impl(objset_t *os, uint64_t last_removed_txg)
-{
- int error = 0;
- uint64_t object = 0;
- while ((error = dmu_object_next(os, &object, B_FALSE, 0)) == 0) {
- error = dmu_object_remap_indirects(os, object,
- last_removed_txg);
- /*
- * If the ZPL removed the object before we managed to dnode_hold
- * it, we would get an ENOENT. If the ZPL declares its intent
- * to remove the object (dnode_free) before we manage to
- * dnode_hold it, we would get an EEXIST. In either case, we
- * want to continue remapping the other objects in the objset;
- * in all other cases, we want to break early.
- */
- if (error != 0 && error != ENOENT && error != EEXIST) {
- break;
- }
- }
- if (error == ESRCH) {
- error = 0;
- }
- return (error);
-}
-
int
-dmu_objset_remap_indirects(const char *fsname)
-{
- int error = 0;
- objset_t *os = NULL;
- uint64_t last_removed_txg;
- uint64_t remap_start_txg;
- dsl_dir_t *dd;
-
- error = dmu_objset_hold(fsname, FTAG, &os);
- if (error != 0) {
- return (error);
- }
- dd = dmu_objset_ds(os)->ds_dir;
-
- if (!spa_feature_is_enabled(dmu_objset_spa(os),
- SPA_FEATURE_OBSOLETE_COUNTS)) {
- dmu_objset_rele(os, FTAG);
- return (SET_ERROR(ENOTSUP));
- }
-
- if (dsl_dataset_is_snapshot(dmu_objset_ds(os))) {
- dmu_objset_rele(os, FTAG);
- return (SET_ERROR(EINVAL));
- }
-
- /*
- * If there has not been a removal, we're done.
- */
- last_removed_txg = spa_get_last_removal_txg(dmu_objset_spa(os));
- if (last_removed_txg == -1ULL) {
- dmu_objset_rele(os, FTAG);
- return (0);
- }
-
- /*
- * If we have remapped since the last removal, we're done.
- */
- if (dsl_dir_is_zapified(dd)) {
- uint64_t last_remap_txg;
- if (zap_lookup(spa_meta_objset(dmu_objset_spa(os)),
- dd->dd_object, DD_FIELD_LAST_REMAP_TXG,
- sizeof (last_remap_txg), 1, &last_remap_txg) == 0 &&
- last_remap_txg > last_removed_txg) {
- dmu_objset_rele(os, FTAG);
- return (0);
- }
- }
-
- dsl_dataset_long_hold(dmu_objset_ds(os), FTAG);
- dsl_pool_rele(dmu_objset_pool(os), FTAG);
-
- remap_start_txg = spa_last_synced_txg(dmu_objset_spa(os));
- error = dmu_objset_remap_indirects_impl(os, last_removed_txg);
- if (error == 0) {
- /*
- * We update the last_remap_txg to be the start txg so that
- * we can guarantee that every block older than last_remap_txg
- * that can be remapped has been remapped.
- */
- error = dsl_dir_update_last_remap_txg(dd, remap_start_txg);
- }
-
- dsl_dataset_long_rele(dmu_objset_ds(os), FTAG);
- dsl_dataset_rele(dmu_objset_ds(os), FTAG);
-
- return (error);
-}
-
-int
dmu_objset_snapshot_one(const char *fsname, const char *snapname)
{
int err;
char *longsnap = kmem_asprintf("%s@%s", fsname, snapname);
nvlist_t *snaps = fnvlist_alloc();
@@ -1311,11 +1260,11 @@
dmu_write_policy(os, NULL, 0, 0, &zp);
zio = arc_write(pio, os->os_spa, tx->tx_txg,
blkptr_copy, os->os_phys_buf, DMU_OS_IS_L2CACHEABLE(os),
&zp, dmu_objset_write_ready, NULL, NULL, dmu_objset_write_done,
- os, ZIO_PRIORITY_ASYNC_WRITE, ZIO_FLAG_MUSTSUCCEED, &zb);
+ os, ZIO_PRIORITY_ASYNC_WRITE, ZIO_FLAG_MUSTSUCCEED, &zb, NULL);
/*
* Sync special dnodes - the parent IO for the sync is the root block
*/
DMU_META_DNODE(os)->dn_zio = zio;
@@ -1851,10 +1800,56 @@
return (zap_lookup_norm(ds->ds_dir->dd_pool->dp_meta_objset,
dsl_dataset_phys(ds)->ds_snapnames_zapobj, name, 8, 1, &ignored,
MT_NORMALIZE, real, maxlen, conflict));
}
+int
+dmu_clone_list_next(objset_t *os, int len, char *name,
+ uint64_t *idp, uint64_t *offp)
+{
+ dsl_dataset_t *ds = os->os_dsl_dataset, *clone;
+ zap_cursor_t cursor;
+ zap_attribute_t attr;
+ char buf[MAXNAMELEN];
+
+ ASSERT(dsl_pool_config_held(dmu_objset_pool(os)));
+
+ if (dsl_dataset_phys(ds)->ds_next_clones_obj == 0)
+ return (SET_ERROR(ENOENT));
+
+ zap_cursor_init_serialized(&cursor,
+ ds->ds_dir->dd_pool->dp_meta_objset,
+ dsl_dataset_phys(ds)->ds_next_clones_obj, *offp);
+
+ if (zap_cursor_retrieve(&cursor, &attr) != 0) {
+ zap_cursor_fini(&cursor);
+ return (SET_ERROR(ENOENT));
+ }
+
+ VERIFY0(dsl_dataset_hold_obj(ds->ds_dir->dd_pool,
+ attr.za_first_integer, FTAG, &clone));
+
+ dsl_dir_name(clone->ds_dir, buf);
+
+ dsl_dataset_rele(clone, FTAG);
+
+ if (strlen(buf) >= len) {
+ zap_cursor_fini(&cursor);
+ return (SET_ERROR(ENAMETOOLONG));
+ }
+
+ (void) strcpy(name, buf);
+ if (idp != NULL)
+ *idp = attr.za_first_integer;
+
+ zap_cursor_advance(&cursor);
+ *offp = zap_cursor_serialize(&cursor);
+ zap_cursor_fini(&cursor);
+
+ return (0);
+}
+
int
dmu_snapshot_list_next(objset_t *os, int namelen, char *name,
uint64_t *idp, uint64_t *offp, boolean_t *case_conflict)
{
dsl_dataset_t *ds = os->os_dsl_dataset;