Print this page
NEX-19394 backport 9337 zfs get all is slow due to uncached metadata
Reviewed by: Joyce McIntosh <joyce.mcintosh@nexenta.com>
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Thomas Caputi <tcaputi@datto.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
 Conflicts:
  usr/src/uts/common/fs/zfs/dbuf.c
  usr/src/uts/common/fs/zfs/dmu.c
  usr/src/uts/common/fs/zfs/sys/dmu_objset.h
NEX-3562 filename normalization doesn't work for removes (sync with upstream)
NEX-9752 backport illumos 6950 ARC should cache compressed data
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
6950 ARC should cache compressed data
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Don Brady <don.brady@intel.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
NEX-5795 Rename 'wrc' as 'wbc' in the source and in the tech docs
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-5272 KRRP: replicate snapshot properties
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Alexey Komarov <alexey.komarov@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
NEX-5318 Cleanup specialclass property (obsolete, not used) and fix related meta-to-special case
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-5058 WBC: Race between the purging of window and opening new one
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
NEX-2830 ZFS smart compression
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
NEX-4934 Add capability to remove special vdev
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
6495 Fix mutex leak in dmu_objset_find_dp
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Albert Lee <trisk@omniti.com>
6267 dn_bonus evicted too early
Reviewed by: Richard Yao <ryao@gentoo.org>
Reviewed by: Xin LI <delphij@freebsd.org>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
2605 want to resume interrupted zfs send
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed by: Xin Li <delphij@freebsd.org>
Reviewed by: Arne Jansen <sensille@gmx.net>
Approved by: Dan McDonald <danmcd@omniti.com>
6160 /usr/lib/fs/zfs/bootinstall should use bootadm
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Reviewed by: Adam Števko <adam.stevko@gmail.com>
Reviewed by: Josef Sipek <jeffpc@josefsipek.net>
Approved by: Richard Lowe <richlowe@richlowe.net>
4185 add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R (NULL is not an int)
6171 dsl_prop_unregister() slows down dataset eviction.
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
5981 Deadlock in dmu_objset_find_dp
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Robert Mustacchi <rm@joyent.com>
5269 zpool import slow
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Dan McDonald <danmcd@omniti.com>
5610 zfs clone from different source and target pools produces coredump
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
NEX-4476 WRC: Allow to use write back cache per tree of datasets
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
Revert "NEX-4476 WRC: Allow to use write back cache per tree of datasets"
This reverts commit fe97b74444278a6f36fec93179133641296312da.
NEX-4476 WRC: Allow to use write back cache per tree of datasets
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
NEX-4028 use lz4 by default
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-3558 KRRP Integration
SUP-507 Delete or truncate of large files delayed on datasets with small recordsize
Reviewed by: Albert Lee <trisk@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Ilya Usvyatsky <ilya.usvyatsky@nexenta.com>
Reviewed by: Tony Nguyen <tony.nguyen@nexenta.com>
OS-80 support for vdev and CoS properties for the new I/O scheduler
OS-95 lint warning introduced by OS-61
Issues #7: Reconsile L2ARC and "special" use by datasets
Support for secondarycache=data option
Align mutex tables in arc.c and dbuf.c to 64 bytes (cache line), place each kmutex_t on cache line by itself to avoid false sharing
re #12619 rb4429 More dp->dp_config_rwlock holds

*** 51,68 **** #include <sys/zfs_ioctl.h> #include <sys/sa.h> #include <sys/zfs_onexit.h> #include <sys/dsl_destroy.h> #include <sys/vdev.h> ! #include <sys/zfeature.h> /* * Needed to close a window in dnode_move() that allows the objset to be freed * before it can be safely accessed. */ krwlock_t os_lock; /* * Tunable to overwrite the maximum number of threads for the parallization * of dmu_objset_find_dp, needed to speed up the import of pools with many * datasets. * Default is 4 times the number of leaf vdevs. --- 51,70 ---- #include <sys/zfs_ioctl.h> #include <sys/sa.h> #include <sys/zfs_onexit.h> #include <sys/dsl_destroy.h> #include <sys/vdev.h> ! #include <sys/wbc.h> /* * Needed to close a window in dnode_move() that allows the objset to be freed * before it can be safely accessed. */ krwlock_t os_lock; + extern kmem_cache_t *zfs_ds_collector_cache; + /* * Tunable to overwrite the maximum number of threads for the parallization * of dmu_objset_find_dp, needed to speed up the import of pools with many * datasets. * Default is 4 times the number of leaf vdevs.
*** 76,95 **** --- 78,110 ---- */ int dmu_rescan_dnode_threshold = 131072; static void dmu_objset_find_dp_cb(void *arg); + /* ARGSUSED */ + static int + zfs_ds_collector_constructor(void *ds_el, void *unused, int flags) + { + bzero(ds_el, sizeof (zfs_ds_collector_entry_t)); + return (0); + } + void dmu_objset_init(void) { + zfs_ds_collector_cache = kmem_cache_create("zfs_ds_collector_cache", + sizeof (zfs_ds_collector_entry_t), + 8, zfs_ds_collector_constructor, + NULL, NULL, NULL, NULL, 0); rw_init(&os_lock, NULL, RW_DEFAULT, NULL); } void dmu_objset_fini(void) { rw_destroy(&os_lock); + kmem_cache_destroy(zfs_ds_collector_cache); } spa_t * dmu_objset_spa(objset_t *os) {
*** 177,186 **** --- 192,209 ---- os->os_compress = zio_compress_select(os->os_spa, newval, ZIO_COMPRESS_ON); } static void + smartcomp_changed_cb(void *arg, uint64_t newval) + { + objset_t *os = arg; + + os->os_smartcomp_enabled = newval ? B_TRUE : B_FALSE; + } + + static void copies_changed_cb(void *arg, uint64_t newval) { objset_t *os = arg; /*
*** 231,246 **** /* * Inheritance and range checking should have been done by now. */ ASSERT(newval == ZFS_CACHE_ALL || newval == ZFS_CACHE_NONE || ! newval == ZFS_CACHE_METADATA); os->os_secondary_cache = newval; } static void sync_changed_cb(void *arg, uint64_t newval) { objset_t *os = arg; /* --- 254,277 ---- /* * Inheritance and range checking should have been done by now. */ ASSERT(newval == ZFS_CACHE_ALL || newval == ZFS_CACHE_NONE || ! newval == ZFS_CACHE_METADATA || newval == ZFS_CACHE_DATA); os->os_secondary_cache = newval; } static void + zpl_meta_placement_changed_cb(void *arg, uint64_t newval) + { + objset_t *os = arg; + + os->os_zpl_meta_to_special = newval; + } + + static void sync_changed_cb(void *arg, uint64_t newval) { objset_t *os = arg; /*
*** 347,367 **** objset_t *os; int i, err; ASSERT(ds == NULL || MUTEX_HELD(&ds->ds_opening_lock)); - /* - * The $ORIGIN dataset (if it exists) doesn't have an associated - * objset, so there's no reason to open it. The $ORIGIN dataset - * will not exist on pools older than SPA_VERSION_ORIGIN. - */ - if (ds != NULL && spa_get_dsl(spa) != NULL && - spa_get_dsl(spa)->dp_origin_snap != NULL) { - ASSERT3P(ds->ds_dir, !=, - spa_get_dsl(spa)->dp_origin_snap->ds_dir); - } - os = kmem_zalloc(sizeof (objset_t), KM_SLEEP); os->os_dsl_dataset = ds; os->os_spa = spa; os->os_rootbp = bp; if (!BP_IS_HOLE(os->os_rootbp)) { --- 378,387 ----
*** 432,441 **** --- 452,466 ---- if (err == 0) { err = dsl_prop_register(ds, zfs_prop_to_name(ZFS_PROP_SECONDARYCACHE), secondary_cache_changed_cb, os); } + if (err == 0) { + err = dsl_prop_register(ds, + zfs_prop_to_name(ZFS_PROP_ZPL_META_TO_METADEV), + zpl_meta_placement_changed_cb, os); + } if (!ds->ds_is_snapshot) { if (err == 0) { err = dsl_prop_register(ds, zfs_prop_to_name(ZFS_PROP_CHECKSUM), checksum_changed_cb, os);
*** 445,454 **** --- 470,484 ---- zfs_prop_to_name(ZFS_PROP_COMPRESSION), compression_changed_cb, os); } if (err == 0) { err = dsl_prop_register(ds, + zfs_prop_to_name(ZFS_PROP_SMARTCOMPRESSION), + smartcomp_changed_cb, os); + } + if (err == 0) { + err = dsl_prop_register(ds, zfs_prop_to_name(ZFS_PROP_COPIES), copies_changed_cb, os); } if (err == 0) { err = dsl_prop_register(ds,
*** 474,484 **** --- 504,519 ---- if (err == 0) { err = dsl_prop_register(ds, zfs_prop_to_name(ZFS_PROP_RECORDSIZE), recordsize_changed_cb, os); } + if (err == 0) { + err = dsl_prop_register(ds, + zfs_prop_to_name(ZFS_PROP_WBC_MODE), + wbc_mode_changed, os); } + } if (needlock) dsl_pool_config_exit(dmu_objset_pool(os), FTAG); if (err != 0) { arc_buf_destroy(os->os_phys_buf, &os->os_phys_buf); kmem_free(os, sizeof (objset_t));
*** 493,503 **** --- 528,547 ---- os->os_dedup_verify = B_FALSE; os->os_logbias = ZFS_LOGBIAS_LATENCY; os->os_sync = ZFS_SYNC_STANDARD; os->os_primary_cache = ZFS_CACHE_ALL; os->os_secondary_cache = ZFS_CACHE_ALL; + os->os_zpl_meta_to_special = 0; } + /* + * These properties will be filled in by the logic in zfs_get_zplprop() + * when they are queried for the first time. + */ + os->os_version = OBJSET_PROP_UNINITIALIZED; + os->os_normalization = OBJSET_PROP_UNINITIALIZED; + os->os_utf8only = OBJSET_PROP_UNINITIALIZED; + os->os_casesensitivity = OBJSET_PROP_UNINITIALIZED; if (ds == NULL || !ds->ds_is_snapshot) os->os_zil_header = os->os_phys->os_zil_header; os->os_zil = zil_alloc(os, &os->os_zil_header);
*** 712,722 **** */ if (dnode_add_ref(dn, FTAG)) { list_insert_after(&os->os_dnodes, dn, &dn_marker); mutex_exit(&os->os_lock); ! dnode_evict_dbufs(dn); dnode_rele(dn, FTAG); mutex_enter(&os->os_lock); dn = list_next(&os->os_dnodes, &dn_marker); list_remove(&os->os_dnodes, &dn_marker); --- 756,766 ---- */ if (dnode_add_ref(dn, FTAG)) { list_insert_after(&os->os_dnodes, dn, &dn_marker); mutex_exit(&os->os_lock); ! dnode_evict_dbufs(dn, DBUF_EVICT_ALL); dnode_rele(dn, FTAG); mutex_enter(&os->os_lock); dn = list_next(&os->os_dnodes, &dn_marker); list_remove(&os->os_dnodes, &dn_marker);
*** 725,738 **** } } mutex_exit(&os->os_lock); if (DMU_USERUSED_DNODE(os) != NULL) { ! dnode_evict_dbufs(DMU_GROUPUSED_DNODE(os)); ! dnode_evict_dbufs(DMU_USERUSED_DNODE(os)); } ! dnode_evict_dbufs(DMU_META_DNODE(os)); } /* * Objset eviction processing is split into into two pieces. * The first marks the objset as evicting, evicts any dbufs that --- 769,782 ---- } } mutex_exit(&os->os_lock); if (DMU_USERUSED_DNODE(os) != NULL) { ! dnode_evict_dbufs(DMU_GROUPUSED_DNODE(os), DBUF_EVICT_ALL); ! dnode_evict_dbufs(DMU_USERUSED_DNODE(os), DBUF_EVICT_ALL); } ! dnode_evict_dbufs(DMU_META_DNODE(os), DBUF_EVICT_ALL); } /* * Objset eviction processing is split into into two pieces. * The first marks the objset as evicting, evicts any dbufs that
*** 1062,1167 **** return (dsl_sync_task(clone, dmu_objset_clone_check, dmu_objset_clone_sync, &doca, 5, ZFS_SPACE_CHECK_NORMAL)); } - static int - dmu_objset_remap_indirects_impl(objset_t *os, uint64_t last_removed_txg) - { - int error = 0; - uint64_t object = 0; - while ((error = dmu_object_next(os, &object, B_FALSE, 0)) == 0) { - error = dmu_object_remap_indirects(os, object, - last_removed_txg); - /* - * If the ZPL removed the object before we managed to dnode_hold - * it, we would get an ENOENT. If the ZPL declares its intent - * to remove the object (dnode_free) before we manage to - * dnode_hold it, we would get an EEXIST. In either case, we - * want to continue remapping the other objects in the objset; - * in all other cases, we want to break early. - */ - if (error != 0 && error != ENOENT && error != EEXIST) { - break; - } - } - if (error == ESRCH) { - error = 0; - } - return (error); - } - int - dmu_objset_remap_indirects(const char *fsname) - { - int error = 0; - objset_t *os = NULL; - uint64_t last_removed_txg; - uint64_t remap_start_txg; - dsl_dir_t *dd; - - error = dmu_objset_hold(fsname, FTAG, &os); - if (error != 0) { - return (error); - } - dd = dmu_objset_ds(os)->ds_dir; - - if (!spa_feature_is_enabled(dmu_objset_spa(os), - SPA_FEATURE_OBSOLETE_COUNTS)) { - dmu_objset_rele(os, FTAG); - return (SET_ERROR(ENOTSUP)); - } - - if (dsl_dataset_is_snapshot(dmu_objset_ds(os))) { - dmu_objset_rele(os, FTAG); - return (SET_ERROR(EINVAL)); - } - - /* - * If there has not been a removal, we're done. - */ - last_removed_txg = spa_get_last_removal_txg(dmu_objset_spa(os)); - if (last_removed_txg == -1ULL) { - dmu_objset_rele(os, FTAG); - return (0); - } - - /* - * If we have remapped since the last removal, we're done. - */ - if (dsl_dir_is_zapified(dd)) { - uint64_t last_remap_txg; - if (zap_lookup(spa_meta_objset(dmu_objset_spa(os)), - dd->dd_object, DD_FIELD_LAST_REMAP_TXG, - sizeof (last_remap_txg), 1, &last_remap_txg) == 0 && - last_remap_txg > last_removed_txg) { - dmu_objset_rele(os, FTAG); - return (0); - } - } - - dsl_dataset_long_hold(dmu_objset_ds(os), FTAG); - dsl_pool_rele(dmu_objset_pool(os), FTAG); - - remap_start_txg = spa_last_synced_txg(dmu_objset_spa(os)); - error = dmu_objset_remap_indirects_impl(os, last_removed_txg); - if (error == 0) { - /* - * We update the last_remap_txg to be the start txg so that - * we can guarantee that every block older than last_remap_txg - * that can be remapped has been remapped. - */ - error = dsl_dir_update_last_remap_txg(dd, remap_start_txg); - } - - dsl_dataset_long_rele(dmu_objset_ds(os), FTAG); - dsl_dataset_rele(dmu_objset_ds(os), FTAG); - - return (error); - } - - int dmu_objset_snapshot_one(const char *fsname, const char *snapname) { int err; char *longsnap = kmem_asprintf("%s@%s", fsname, snapname); nvlist_t *snaps = fnvlist_alloc(); --- 1106,1116 ----
*** 1311,1321 **** dmu_write_policy(os, NULL, 0, 0, &zp); zio = arc_write(pio, os->os_spa, tx->tx_txg, blkptr_copy, os->os_phys_buf, DMU_OS_IS_L2CACHEABLE(os), &zp, dmu_objset_write_ready, NULL, NULL, dmu_objset_write_done, ! os, ZIO_PRIORITY_ASYNC_WRITE, ZIO_FLAG_MUSTSUCCEED, &zb); /* * Sync special dnodes - the parent IO for the sync is the root block */ DMU_META_DNODE(os)->dn_zio = zio; --- 1260,1270 ---- dmu_write_policy(os, NULL, 0, 0, &zp); zio = arc_write(pio, os->os_spa, tx->tx_txg, blkptr_copy, os->os_phys_buf, DMU_OS_IS_L2CACHEABLE(os), &zp, dmu_objset_write_ready, NULL, NULL, dmu_objset_write_done, ! os, ZIO_PRIORITY_ASYNC_WRITE, ZIO_FLAG_MUSTSUCCEED, &zb, NULL); /* * Sync special dnodes - the parent IO for the sync is the root block */ DMU_META_DNODE(os)->dn_zio = zio;
*** 1851,1860 **** --- 1800,1855 ---- return (zap_lookup_norm(ds->ds_dir->dd_pool->dp_meta_objset, dsl_dataset_phys(ds)->ds_snapnames_zapobj, name, 8, 1, &ignored, MT_NORMALIZE, real, maxlen, conflict)); } + int + dmu_clone_list_next(objset_t *os, int len, char *name, + uint64_t *idp, uint64_t *offp) + { + dsl_dataset_t *ds = os->os_dsl_dataset, *clone; + zap_cursor_t cursor; + zap_attribute_t attr; + char buf[MAXNAMELEN]; + + ASSERT(dsl_pool_config_held(dmu_objset_pool(os))); + + if (dsl_dataset_phys(ds)->ds_next_clones_obj == 0) + return (SET_ERROR(ENOENT)); + + zap_cursor_init_serialized(&cursor, + ds->ds_dir->dd_pool->dp_meta_objset, + dsl_dataset_phys(ds)->ds_next_clones_obj, *offp); + + if (zap_cursor_retrieve(&cursor, &attr) != 0) { + zap_cursor_fini(&cursor); + return (SET_ERROR(ENOENT)); + } + + VERIFY0(dsl_dataset_hold_obj(ds->ds_dir->dd_pool, + attr.za_first_integer, FTAG, &clone)); + + dsl_dir_name(clone->ds_dir, buf); + + dsl_dataset_rele(clone, FTAG); + + if (strlen(buf) >= len) { + zap_cursor_fini(&cursor); + return (SET_ERROR(ENAMETOOLONG)); + } + + (void) strcpy(name, buf); + if (idp != NULL) + *idp = attr.za_first_integer; + + zap_cursor_advance(&cursor); + *offp = zap_cursor_serialize(&cursor); + zap_cursor_fini(&cursor); + + return (0); + } + int dmu_snapshot_list_next(objset_t *os, int namelen, char *name, uint64_t *idp, uint64_t *offp, boolean_t *case_conflict) { dsl_dataset_t *ds = os->os_dsl_dataset;