Print this page
NEX-19394 backport 9337 zfs get all is slow due to uncached metadata
Reviewed by: Joyce McIntosh <joyce.mcintosh@nexenta.com>
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Thomas Caputi <tcaputi@datto.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
 Conflicts:
  usr/src/uts/common/fs/zfs/dbuf.c
  usr/src/uts/common/fs/zfs/dmu.c
  usr/src/uts/common/fs/zfs/sys/dmu_objset.h
NEX-3562 filename normalization doesn't work for removes (sync with upstream)
NEX-9752 backport illumos 6950 ARC should cache compressed data
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
6950 ARC should cache compressed data
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Don Brady <don.brady@intel.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
NEX-5795 Rename 'wrc' as 'wbc' in the source and in the tech docs
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-5272 KRRP: replicate snapshot properties
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Alexey Komarov <alexey.komarov@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
NEX-5318 Cleanup specialclass property (obsolete, not used) and fix related meta-to-special case
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-5058 WBC: Race between the purging of window and opening new one
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
NEX-2830 ZFS smart compression
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
NEX-4934 Add capability to remove special vdev
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
6495 Fix mutex leak in dmu_objset_find_dp
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Albert Lee <trisk@omniti.com>
6267 dn_bonus evicted too early
Reviewed by: Richard Yao <ryao@gentoo.org>
Reviewed by: Xin LI <delphij@freebsd.org>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
2605 want to resume interrupted zfs send
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed by: Xin Li <delphij@freebsd.org>
Reviewed by: Arne Jansen <sensille@gmx.net>
Approved by: Dan McDonald <danmcd@omniti.com>
6160 /usr/lib/fs/zfs/bootinstall should use bootadm
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Reviewed by: Adam Števko <adam.stevko@gmail.com>
Reviewed by: Josef Sipek <jeffpc@josefsipek.net>
Approved by: Richard Lowe <richlowe@richlowe.net>
4185 add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R (NULL is not an int)
6171 dsl_prop_unregister() slows down dataset eviction.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
5981 Deadlock in dmu_objset_find_dp
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Robert Mustacchi <rm@joyent.com>
5269 zpool import slow
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Dan McDonald <danmcd@omniti.com>
5610 zfs clone from different source and target pools produces coredump
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
NEX-4476 WRC: Allow to use write back cache per tree of datasets
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
Revert "NEX-4476 WRC: Allow to use write back cache per tree of datasets"
This reverts commit fe97b74444278a6f36fec93179133641296312da.
NEX-4476 WRC: Allow to use write back cache per tree of datasets
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
NEX-4028 use lz4 by default
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-3558 KRRP Integration
SUP-507 Delete or truncate of large files delayed on datasets with small recordsize
Reviewed by: Albert Lee <trisk@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Ilya Usvyatsky <ilya.usvyatsky@nexenta.com>
Reviewed by: Tony Nguyen <tony.nguyen@nexenta.com>
OS-80 support for vdev and CoS properties for the new I/O scheduler
OS-95 lint warning introduced by OS-61
Issues #7: Reconsile L2ARC and "special" use by datasets
Support for secondarycache=data option
Align mutex tables in arc.c and dbuf.c to 64 bytes (cache line), place each kmutex_t on cache line by itself to avoid false sharing
re #12619 rb4429 More dp->dp_config_rwlock holds

@@ -51,18 +51,20 @@
 #include <sys/zfs_ioctl.h>
 #include <sys/sa.h>
 #include <sys/zfs_onexit.h>
 #include <sys/dsl_destroy.h>
 #include <sys/vdev.h>
-#include <sys/zfeature.h>
+#include <sys/wbc.h>
 
 /*
  * Needed to close a window in dnode_move() that allows the objset to be freed
  * before it can be safely accessed.
  */
 krwlock_t os_lock;
 
+extern kmem_cache_t *zfs_ds_collector_cache;
+
 /*
  * Tunable to overwrite the maximum number of threads for the parallization
  * of dmu_objset_find_dp, needed to speed up the import of pools with many
  * datasets.
  * Default is 4 times the number of leaf vdevs.

@@ -76,20 +78,33 @@
  */
 int dmu_rescan_dnode_threshold = 131072;
 
 static void dmu_objset_find_dp_cb(void *arg);
 
+/* ARGSUSED */
+static int
+zfs_ds_collector_constructor(void *ds_el, void *unused, int flags)
+{
+        bzero(ds_el, sizeof (zfs_ds_collector_entry_t));
+        return (0);
+}
+
 void
 dmu_objset_init(void)
 {
+        zfs_ds_collector_cache = kmem_cache_create("zfs_ds_collector_cache",
+            sizeof (zfs_ds_collector_entry_t),
+            8, zfs_ds_collector_constructor,
+            NULL, NULL, NULL, NULL, 0);
         rw_init(&os_lock, NULL, RW_DEFAULT, NULL);
 }
 
 void
 dmu_objset_fini(void)
 {
         rw_destroy(&os_lock);
+        kmem_cache_destroy(zfs_ds_collector_cache);
 }
 
 spa_t *
 dmu_objset_spa(objset_t *os)
 {

@@ -177,10 +192,18 @@
         os->os_compress = zio_compress_select(os->os_spa, newval,
             ZIO_COMPRESS_ON);
 }
 
 static void
+smartcomp_changed_cb(void *arg, uint64_t newval)
+{
+        objset_t *os = arg;
+
+        os->os_smartcomp_enabled = newval ? B_TRUE : B_FALSE;
+}
+
+static void
 copies_changed_cb(void *arg, uint64_t newval)
 {
         objset_t *os = arg;
 
         /*

@@ -231,16 +254,24 @@
 
         /*
          * Inheritance and range checking should have been done by now.
          */
         ASSERT(newval == ZFS_CACHE_ALL || newval == ZFS_CACHE_NONE ||
-            newval == ZFS_CACHE_METADATA);
+            newval == ZFS_CACHE_METADATA || newval == ZFS_CACHE_DATA);
 
         os->os_secondary_cache = newval;
 }
 
 static void
+zpl_meta_placement_changed_cb(void *arg, uint64_t newval)
+{
+        objset_t *os = arg;
+
+        os->os_zpl_meta_to_special = newval;
+}
+
+static void
 sync_changed_cb(void *arg, uint64_t newval)
 {
         objset_t *os = arg;
 
         /*

@@ -347,21 +378,10 @@
         objset_t *os;
         int i, err;
 
         ASSERT(ds == NULL || MUTEX_HELD(&ds->ds_opening_lock));
 
-        /*
-         * The $ORIGIN dataset (if it exists) doesn't have an associated
-         * objset, so there's no reason to open it. The $ORIGIN dataset
-         * will not exist on pools older than SPA_VERSION_ORIGIN.
-         */
-        if (ds != NULL && spa_get_dsl(spa) != NULL &&
-            spa_get_dsl(spa)->dp_origin_snap != NULL) {
-                ASSERT3P(ds->ds_dir, !=,
-                    spa_get_dsl(spa)->dp_origin_snap->ds_dir);
-        }
-
         os = kmem_zalloc(sizeof (objset_t), KM_SLEEP);
         os->os_dsl_dataset = ds;
         os->os_spa = spa;
         os->os_rootbp = bp;
         if (!BP_IS_HOLE(os->os_rootbp)) {

@@ -432,10 +452,15 @@
                 if (err == 0) {
                         err = dsl_prop_register(ds,
                             zfs_prop_to_name(ZFS_PROP_SECONDARYCACHE),
                             secondary_cache_changed_cb, os);
                 }
+                if (err == 0) {
+                        err = dsl_prop_register(ds,
+                            zfs_prop_to_name(ZFS_PROP_ZPL_META_TO_METADEV),
+                            zpl_meta_placement_changed_cb, os);
+                }
                 if (!ds->ds_is_snapshot) {
                         if (err == 0) {
                                 err = dsl_prop_register(ds,
                                     zfs_prop_to_name(ZFS_PROP_CHECKSUM),
                                     checksum_changed_cb, os);

@@ -445,10 +470,15 @@
                                     zfs_prop_to_name(ZFS_PROP_COMPRESSION),
                                     compression_changed_cb, os);
                         }
                         if (err == 0) {
                                 err = dsl_prop_register(ds,
+                                    zfs_prop_to_name(ZFS_PROP_SMARTCOMPRESSION),
+                                    smartcomp_changed_cb, os);
+                        }
+                        if (err == 0) {
+                                err = dsl_prop_register(ds,
                                     zfs_prop_to_name(ZFS_PROP_COPIES),
                                     copies_changed_cb, os);
                         }
                         if (err == 0) {
                                 err = dsl_prop_register(ds,

@@ -474,11 +504,16 @@
                         if (err == 0) {
                                 err = dsl_prop_register(ds,
                                     zfs_prop_to_name(ZFS_PROP_RECORDSIZE),
                                     recordsize_changed_cb, os);
                         }
+                        if (err == 0) {
+                                err = dsl_prop_register(ds,
+                                    zfs_prop_to_name(ZFS_PROP_WBC_MODE),
+                                    wbc_mode_changed, os);
                 }
+                }
                 if (needlock)
                         dsl_pool_config_exit(dmu_objset_pool(os), FTAG);
                 if (err != 0) {
                         arc_buf_destroy(os->os_phys_buf, &os->os_phys_buf);
                         kmem_free(os, sizeof (objset_t));

@@ -493,11 +528,20 @@
                 os->os_dedup_verify = B_FALSE;
                 os->os_logbias = ZFS_LOGBIAS_LATENCY;
                 os->os_sync = ZFS_SYNC_STANDARD;
                 os->os_primary_cache = ZFS_CACHE_ALL;
                 os->os_secondary_cache = ZFS_CACHE_ALL;
+                os->os_zpl_meta_to_special = 0;
         }
+        /*
+         * These properties will be filled in by the logic in zfs_get_zplprop()
+         * when they are queried for the first time.
+         */
+        os->os_version = OBJSET_PROP_UNINITIALIZED;
+        os->os_normalization = OBJSET_PROP_UNINITIALIZED;
+        os->os_utf8only = OBJSET_PROP_UNINITIALIZED;
+        os->os_casesensitivity = OBJSET_PROP_UNINITIALIZED;
 
         if (ds == NULL || !ds->ds_is_snapshot)
                 os->os_zil_header = os->os_phys->os_zil_header;
         os->os_zil = zil_alloc(os, &os->os_zil_header);
 

@@ -712,11 +756,11 @@
                  */
                 if (dnode_add_ref(dn, FTAG)) {
                         list_insert_after(&os->os_dnodes, dn, &dn_marker);
                         mutex_exit(&os->os_lock);
 
-                        dnode_evict_dbufs(dn);
+                        dnode_evict_dbufs(dn, DBUF_EVICT_ALL);
                         dnode_rele(dn, FTAG);
 
                         mutex_enter(&os->os_lock);
                         dn = list_next(&os->os_dnodes, &dn_marker);
                         list_remove(&os->os_dnodes, &dn_marker);

@@ -725,14 +769,14 @@
                 }
         }
         mutex_exit(&os->os_lock);
 
         if (DMU_USERUSED_DNODE(os) != NULL) {
-                dnode_evict_dbufs(DMU_GROUPUSED_DNODE(os));
-                dnode_evict_dbufs(DMU_USERUSED_DNODE(os));
+                dnode_evict_dbufs(DMU_GROUPUSED_DNODE(os), DBUF_EVICT_ALL);
+                dnode_evict_dbufs(DMU_USERUSED_DNODE(os), DBUF_EVICT_ALL);
         }
-        dnode_evict_dbufs(DMU_META_DNODE(os));
+        dnode_evict_dbufs(DMU_META_DNODE(os), DBUF_EVICT_ALL);
 }
 
 /*
  * Objset eviction processing is split into into two pieces.
  * The first marks the objset as evicting, evicts any dbufs that

@@ -1062,106 +1106,11 @@
         return (dsl_sync_task(clone,
             dmu_objset_clone_check, dmu_objset_clone_sync, &doca,
             5, ZFS_SPACE_CHECK_NORMAL));
 }
 
-static int
-dmu_objset_remap_indirects_impl(objset_t *os, uint64_t last_removed_txg)
-{
-        int error = 0;
-        uint64_t object = 0;
-        while ((error = dmu_object_next(os, &object, B_FALSE, 0)) == 0) {
-                error = dmu_object_remap_indirects(os, object,
-                    last_removed_txg);
-                /*
-                 * If the ZPL removed the object before we managed to dnode_hold
-                 * it, we would get an ENOENT. If the ZPL declares its intent
-                 * to remove the object (dnode_free) before we manage to
-                 * dnode_hold it, we would get an EEXIST. In either case, we
-                 * want to continue remapping the other objects in the objset;
-                 * in all other cases, we want to break early.
-                 */
-                if (error != 0 && error != ENOENT && error != EEXIST) {
-                        break;
-                }
-        }
-        if (error == ESRCH) {
-                error = 0;
-        }
-        return (error);
-}
-
 int
-dmu_objset_remap_indirects(const char *fsname)
-{
-        int error = 0;
-        objset_t *os = NULL;
-        uint64_t last_removed_txg;
-        uint64_t remap_start_txg;
-        dsl_dir_t *dd;
-
-        error = dmu_objset_hold(fsname, FTAG, &os);
-        if (error != 0) {
-                return (error);
-        }
-        dd = dmu_objset_ds(os)->ds_dir;
-
-        if (!spa_feature_is_enabled(dmu_objset_spa(os),
-            SPA_FEATURE_OBSOLETE_COUNTS)) {
-                dmu_objset_rele(os, FTAG);
-                return (SET_ERROR(ENOTSUP));
-        }
-
-        if (dsl_dataset_is_snapshot(dmu_objset_ds(os))) {
-                dmu_objset_rele(os, FTAG);
-                return (SET_ERROR(EINVAL));
-        }
-
-        /*
-         * If there has not been a removal, we're done.
-         */
-        last_removed_txg = spa_get_last_removal_txg(dmu_objset_spa(os));
-        if (last_removed_txg == -1ULL) {
-                dmu_objset_rele(os, FTAG);
-                return (0);
-        }
-
-        /*
-         * If we have remapped since the last removal, we're done.
-         */
-        if (dsl_dir_is_zapified(dd)) {
-                uint64_t last_remap_txg;
-                if (zap_lookup(spa_meta_objset(dmu_objset_spa(os)),
-                    dd->dd_object, DD_FIELD_LAST_REMAP_TXG,
-                    sizeof (last_remap_txg), 1, &last_remap_txg) == 0 &&
-                    last_remap_txg > last_removed_txg) {
-                        dmu_objset_rele(os, FTAG);
-                        return (0);
-                }
-        }
-
-        dsl_dataset_long_hold(dmu_objset_ds(os), FTAG);
-        dsl_pool_rele(dmu_objset_pool(os), FTAG);
-
-        remap_start_txg = spa_last_synced_txg(dmu_objset_spa(os));
-        error = dmu_objset_remap_indirects_impl(os, last_removed_txg);
-        if (error == 0) {
-                /*
-                 * We update the last_remap_txg to be the start txg so that
-                 * we can guarantee that every block older than last_remap_txg
-                 * that can be remapped has been remapped.
-                 */
-                error = dsl_dir_update_last_remap_txg(dd, remap_start_txg);
-        }
-
-        dsl_dataset_long_rele(dmu_objset_ds(os), FTAG);
-        dsl_dataset_rele(dmu_objset_ds(os), FTAG);
-
-        return (error);
-}
-
-int
 dmu_objset_snapshot_one(const char *fsname, const char *snapname)
 {
         int err;
         char *longsnap = kmem_asprintf("%s@%s", fsname, snapname);
         nvlist_t *snaps = fnvlist_alloc();

@@ -1311,11 +1260,11 @@
         dmu_write_policy(os, NULL, 0, 0, &zp);
 
         zio = arc_write(pio, os->os_spa, tx->tx_txg,
             blkptr_copy, os->os_phys_buf, DMU_OS_IS_L2CACHEABLE(os),
             &zp, dmu_objset_write_ready, NULL, NULL, dmu_objset_write_done,
-            os, ZIO_PRIORITY_ASYNC_WRITE, ZIO_FLAG_MUSTSUCCEED, &zb);
+            os, ZIO_PRIORITY_ASYNC_WRITE, ZIO_FLAG_MUSTSUCCEED, &zb, NULL);
 
         /*
          * Sync special dnodes - the parent IO for the sync is the root block
          */
         DMU_META_DNODE(os)->dn_zio = zio;

@@ -1851,10 +1800,56 @@
         return (zap_lookup_norm(ds->ds_dir->dd_pool->dp_meta_objset,
             dsl_dataset_phys(ds)->ds_snapnames_zapobj, name, 8, 1, &ignored,
             MT_NORMALIZE, real, maxlen, conflict));
 }
 
+int
+dmu_clone_list_next(objset_t *os, int len, char *name,
+    uint64_t *idp, uint64_t *offp)
+{
+        dsl_dataset_t *ds = os->os_dsl_dataset, *clone;
+        zap_cursor_t cursor;
+        zap_attribute_t attr;
+        char buf[MAXNAMELEN];
+
+        ASSERT(dsl_pool_config_held(dmu_objset_pool(os)));
+
+        if (dsl_dataset_phys(ds)->ds_next_clones_obj == 0)
+                return (SET_ERROR(ENOENT));
+
+        zap_cursor_init_serialized(&cursor,
+            ds->ds_dir->dd_pool->dp_meta_objset,
+            dsl_dataset_phys(ds)->ds_next_clones_obj, *offp);
+
+        if (zap_cursor_retrieve(&cursor, &attr) != 0) {
+                zap_cursor_fini(&cursor);
+                return (SET_ERROR(ENOENT));
+        }
+
+        VERIFY0(dsl_dataset_hold_obj(ds->ds_dir->dd_pool,
+            attr.za_first_integer, FTAG, &clone));
+
+        dsl_dir_name(clone->ds_dir, buf);
+
+        dsl_dataset_rele(clone, FTAG);
+
+        if (strlen(buf) >= len) {
+                zap_cursor_fini(&cursor);
+                return (SET_ERROR(ENAMETOOLONG));
+        }
+
+        (void) strcpy(name, buf);
+        if (idp != NULL)
+                *idp = attr.za_first_integer;
+
+        zap_cursor_advance(&cursor);
+        *offp = zap_cursor_serialize(&cursor);
+        zap_cursor_fini(&cursor);
+
+        return (0);
+}
+
 int
 dmu_snapshot_list_next(objset_t *os, int namelen, char *name,
     uint64_t *idp, uint64_t *offp, boolean_t *case_conflict)
 {
         dsl_dataset_t *ds = os->os_dsl_dataset;