Print this page
NEX-19394 backport 9337 zfs get all is slow due to uncached metadata
Reviewed by: Joyce McIntosh <joyce.mcintosh@nexenta.com>
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Thomas Caputi <tcaputi@datto.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
 Conflicts:
  usr/src/uts/common/fs/zfs/dbuf.c
  usr/src/uts/common/fs/zfs/dmu.c
  usr/src/uts/common/fs/zfs/sys/dmu_objset.h
NEX-3214 remove cos object type from dmu.h
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
NEX-5366 Race between unique_insert() and unique_remove() causes ZFS fsid change
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Dan Vatca <dan.vatca@gmail.com>
NEX-5795 Rename 'wrc' as 'wbc' in the source and in the tech docs
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-5064 On-demand trim should store operation start and stop time
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-4934 Add capability to remove special vdev
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
4185 add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R (fix studio build)
4185 add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Approved by: Garrett D'Amore <garrett@damore.org>
5987 zfs prefetch code needs work
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Gordon Ross <gordon.ross@nexenta.com>
NEX-4582 update wrc test cases for allow to use write back cache per tree of datasets
Reviewed by: Steve Peng <steve.peng@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
5960 zfs recv should prefetch indirect blocks
5925 zfs receive -o origin=
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
5269 zpool import slow
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Dan McDonald <danmcd@omniti.com>
5692 expose the number of hole blocks in a file
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Boris Protopopov <bprotopopov@hotmail.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
NEX-3669 Faults for fans that don't exist
Reviewed by: Jeffry Molanus <jeffry.molanus@nexenta.com>
NEX-3891 Hide the snapshots that belong to in-kernel autosnap-service
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Reviewed by: Alek Pinchuk <alek@nexenta.com>
NEX-3558 KRRP Integration
NEX-3212 remove vdev prop object type from dmu.h
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Josef Sipek <josef.sipek@nexenta.com>
4370 avoid transmitting holes during zfs send
4371 DMU code clean up
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net>
Approved by: Garrett D'Amore <garrett@damore.org>
OS-80 support for vdev and CoS properties for the new I/O scheduler
OS-95 lint warning introduced by OS-61
Issue #40: ZDB shouldn't crash with new code
Support for secondarycache=data option
Align mutex tables in arc.c and dbuf.c to 64 bytes (cache line), place each kmutex_t on cache line by itself to avoid false sharing
Fixup merge results
re #12585 rb4049 ZFS++ work port - refactoring to improve separation of open/closed code, bug fixes, performance improvements - open code
Bug 11205: add missing libzfs_closed_stubs.c to fix opensource-only build.
ZFS plus work: special vdevs, cos, cos/vdev properties
Bug 10481 - Dry run option in 'zfs send' isn't the same as in NexentaStor 3.1

@@ -20,11 +20,11 @@
  */
 
 /*
  * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
  * Copyright (c) 2011, 2017 by Delphix. All rights reserved.
- * Copyright 2011 Nexenta Systems, Inc. All rights reserved.
+ * Copyright 2016 Nexenta Systems, Inc. All rights reserved.
  * Copyright (c) 2012, Joyent, Inc. All rights reserved.
  * Copyright 2013 DEY Storage Systems, Inc.
  * Copyright 2014 HybridCluster. All rights reserved.
  * Copyright (c) 2014 Spectra Logic Corporation, All rights reserved.
  * Copyright 2013 Saso Kiselkov. All rights reserved.

@@ -107,11 +107,12 @@
 #define DMU_OT_BYTESWAP_MASK 0x3f
 
 /*
  * Defines a uint8_t object type. Object types specify if the data
  * in the object is metadata (boolean) and how to byteswap the data
- * (dmu_object_byteswap_t).
+ * (dmu_object_byteswap_t). All of the types created by this method
+ * are cached in the dbuf metadata cache.
  */
 #define DMU_OT(byteswap, metadata) \
         (DMU_OT_NEWTYPE | \
         ((metadata) ? DMU_OT_METADATA : 0) | \
         ((byteswap) & DMU_OT_BYTESWAP_MASK))

@@ -122,10 +123,13 @@
 
 #define DMU_OT_IS_METADATA(ot) (((ot) & DMU_OT_NEWTYPE) ? \
         ((ot) & DMU_OT_METADATA) : \
         dmu_ot[(ot)].ot_metadata)
 
+#define DMU_OT_IS_METADATA_CACHED(ot) (((ot) & DMU_OT_NEWTYPE) ? \
+        B_TRUE : dmu_ot[(ot)].ot_dbuf_metadata_cache)
+
 /*
  * These object types use bp_fill != 1 for their L0 bp's. Therefore they can't
  * have their data embedded (i.e. use a BP_IS_EMBEDDED() bp), because bp_fill
  * is repurposed for embedded BPs.
  */

@@ -229,19 +233,30 @@
         DMU_OTN_UINT64_METADATA = DMU_OT(DMU_BSWAP_UINT64, B_TRUE),
         DMU_OTN_ZAP_DATA = DMU_OT(DMU_BSWAP_ZAP, B_FALSE),
         DMU_OTN_ZAP_METADATA = DMU_OT(DMU_BSWAP_ZAP, B_TRUE),
 } dmu_object_type_t;
 
+typedef enum txg_how {
+        TXG_WAIT = 1,
+        TXG_NOWAIT,
+        TXG_WAITED,
+} txg_how_t;
+
 /*
- * These flags are intended to be used to specify the "txg_how"
- * parameter when calling the dmu_tx_assign() function. See the comment
- * above dmu_tx_assign() for more details on the meaning of these flags.
+ * Selected classes of metadata
  */
-#define TXG_NOWAIT      (0ULL)
-#define TXG_WAIT        (1ULL<<0)
-#define TXG_NOTHROTTLE  (1ULL<<1)
+#define DMU_OT_IS_DDT_META(type)        \
+        ((type == DMU_OT_DDT_ZAP) ||    \
+        (type == DMU_OT_DDT_STATS))
 
+#define DMU_OT_IS_ZPL_META(type)                \
+        ((type == DMU_OT_ZNODE) ||              \
+        (type == DMU_OT_OLDACL) ||              \
+        (type == DMU_OT_DIRECTORY_CONTENTS) ||  \
+        (type == DMU_OT_MASTER_NODE) ||         \
+        (type == DMU_OT_UNLINKED_SET))
+
 void byteswap_uint64_array(void *buf, size_t size);
 void byteswap_uint32_array(void *buf, size_t size);
 void byteswap_uint16_array(void *buf, size_t size);
 void byteswap_uint8_array(void *buf, size_t size);
 void zap_byteswap(void *buf, size_t size);

@@ -289,11 +304,10 @@
 int dmu_objset_find(char *name, int func(const char *, void *), void *arg,
     int flags);
 void dmu_objset_byteswap(void *buf, size_t size);
 int dsl_dataset_rename_snapshot(const char *fsname,
     const char *oldsnapname, const char *newsnapname, boolean_t recursive);
-int dmu_objset_remap_indirects(const char *fsname);
 
 typedef struct dmu_buf {
         uint64_t db_object;             /* object that this buffer is part of */
         uint64_t db_offset;             /* byte offset in this object */
         uint64_t db_size;               /* size of buffer in bytes */

@@ -326,14 +340,16 @@
 #define DMU_POOL_FREE_BPOBJ             "free_bpobj"
 #define DMU_POOL_BPTREE_OBJ             "bptree_obj"
 #define DMU_POOL_EMPTY_BPOBJ            "empty_bpobj"
 #define DMU_POOL_CHECKSUM_SALT          "org.illumos:checksum_salt"
 #define DMU_POOL_VDEV_ZAP_MAP           "com.delphix:vdev_zap_map"
-#define DMU_POOL_REMOVING               "com.delphix:removing"
-#define DMU_POOL_OBSOLETE_BPOBJ         "com.delphix:obsolete_bpobj"
-#define DMU_POOL_CONDENSING_INDIRECT    "com.delphix:condensing_indirect"
 
+#define DMU_POOL_COS_PROPS              "cos_props"
+#define DMU_POOL_VDEV_PROPS             "vdev_props"
+#define DMU_POOL_TRIM_START_TIME        "trim_start_time"
+#define DMU_POOL_TRIM_STOP_TIME         "trim_stop_time"
+
 /*
  * Allocate an object from this objset.  The range of object numbers
  * available is (0, DN_MAX_OBJECT).  Object 0 is the meta-dnode.
  *
  * The transaction must be assigned to a txg.  The newly allocated

@@ -412,12 +428,10 @@
  * apply to all newly written blocks; existing blocks will not be affected.
  */
 void dmu_object_set_compress(objset_t *os, uint64_t object, uint8_t compress,
     dmu_tx_t *tx);
 
-int dmu_object_remap_indirects(objset_t *os, uint64_t object, uint64_t txg);
-
 void
 dmu_write_embedded(objset_t *os, uint64_t object, uint64_t offset,
     void *data, uint8_t etype, uint8_t comp, int uncompressed_size,
     int compressed_size, int byteorder, dmu_tx_t *tx);
 

@@ -426,18 +440,30 @@
  */
 #define WP_NOFILL       0x1
 #define WP_DMU_SYNC     0x2
 #define WP_SPILL        0x4
 
+#define WP_SPECIALCLASS_SHIFT   (16)
+#define WP_SPECIALCLASS_BITS    (1) /* 1 bits per storage class */
+#define WP_SPECIALCLASS_MASK    (((1 << WP_SPECIALCLASS_BITS) - 1) \
+        << WP_SPECIALCLASS_SHIFT)
+
+#define WP_SET_SPECIALCLASS(flags, sclass)      { \
+        flags |= ((sclass << WP_SPECIALCLASS_SHIFT) & WP_SPECIALCLASS_MASK); \
+}
+
+#define WP_GET_SPECIALCLASS(flags) \
+        ((flags & WP_SPECIALCLASS_MASK) >> WP_SPECIALCLASS_SHIFT)
+
 void dmu_write_policy(objset_t *os, dnode_t *dn, int level, int wp,
     struct zio_prop *zp);
 /*
  * The bonus data is accessed more or less like a regular buffer.
  * You must dmu_bonus_hold() to get the buffer, which will give you a
  * dmu_buf_t with db_offset==-1ULL, and db_size = the size of the bonus
- * data.  As with any normal buffer, you must call dmu_buf_will_dirty()
- * before modifying it, and the
+ * data.  As with any normal buffer, you must call dmu_buf_read() to
+ * read db_data, dmu_buf_will_dirty() before modifying it, and the
  * object must be held in an assigned transaction before calling
  * dmu_buf_will_dirty.  You may use dmu_buf_set_user() on the bonus
  * buffer as well.  You must release your hold with dmu_buf_rele().
  *
  * Returns ENOENT, EIO, or 0.

@@ -650,10 +676,11 @@
  * The transaction (tx) must be assigned to a txg (ie. you've called
  * dmu_tx_assign()).  The buffer's object must be held in the tx
  * (ie. you've called dmu_tx_hold_object(tx, db->db_object)).
  */
 void dmu_buf_will_dirty(dmu_buf_t *db, dmu_tx_t *tx);
+void dmu_buf_will_dirty_sc(dmu_buf_t *db, dmu_tx_t *tx, boolean_t sc);
 
 /*
  * You must create a transaction, then hold the objects which you will
  * (or might) modify as part of this transaction.  Then you must assign
  * the transaction to a transaction group.  Once the transaction has

@@ -680,21 +707,20 @@
     int len);
 void dmu_tx_hold_free(dmu_tx_t *tx, uint64_t object, uint64_t off,
     uint64_t len);
 void dmu_tx_hold_free_by_dnode(dmu_tx_t *tx, dnode_t *dn, uint64_t off,
     uint64_t len);
-void dmu_tx_hold_remap_l1indirect(dmu_tx_t *tx, uint64_t object);
 void dmu_tx_hold_zap(dmu_tx_t *tx, uint64_t object, int add, const char *name);
 void dmu_tx_hold_zap_by_dnode(dmu_tx_t *tx, dnode_t *dn, int add,
     const char *name);
 void dmu_tx_hold_bonus(dmu_tx_t *tx, uint64_t object);
 void dmu_tx_hold_bonus_by_dnode(dmu_tx_t *tx, dnode_t *dn);
 void dmu_tx_hold_spill(dmu_tx_t *tx, uint64_t object);
 void dmu_tx_hold_sa(dmu_tx_t *tx, struct sa_handle *hdl, boolean_t may_grow);
 void dmu_tx_hold_sa_create(dmu_tx_t *tx, int total_size);
 void dmu_tx_abort(dmu_tx_t *tx);
-int dmu_tx_assign(dmu_tx_t *tx, uint64_t txg_how);
+int dmu_tx_assign(dmu_tx_t *tx, enum txg_how txg_how);
 void dmu_tx_wait(dmu_tx_t *tx);
 void dmu_tx_commit(dmu_tx_t *tx);
 void dmu_tx_mark_netfree(dmu_tx_t *tx);
 
 /*

@@ -796,10 +822,11 @@
 typedef void arc_byteswap_func_t(void *buf, size_t size);
 
 typedef struct dmu_object_type_info {
         dmu_object_byteswap_t   ot_byteswap;
         boolean_t               ot_metadata;
+        boolean_t               ot_dbuf_metadata_cache;
         char                    *ot_name;
 } dmu_object_type_info_t;
 
 typedef struct dmu_object_byteswap_info {
         arc_byteswap_func_t     *ob_func;

@@ -832,10 +859,11 @@
         uint64_t dds_num_clones; /* number of clones of this */
         uint64_t dds_creation_txg;
         uint64_t dds_guid;
         dmu_objset_type_t dds_type;
         uint8_t dds_is_snapshot;
+        uint8_t dds_is_autosnapshot;
         uint8_t dds_inconsistent;
         char dds_origin[ZFS_MAX_DATASET_NAME_LEN];
 } dmu_objset_stats_t;
 
 /*

@@ -885,10 +913,12 @@
 extern void dmu_objset_name(objset_t *os, char *buf);
 extern dmu_objset_type_t dmu_objset_type(objset_t *os);
 extern uint64_t dmu_objset_id(objset_t *os);
 extern zfs_sync_type_t dmu_objset_syncprop(objset_t *os);
 extern zfs_logbias_op_t dmu_objset_logbias(objset_t *os);
+int dmu_clone_list_next(objset_t *os, int len, char *name,
+    uint64_t *idp, uint64_t *offp);
 extern int dmu_snapshot_list_next(objset_t *os, int namelen, char *name,
     uint64_t *id, uint64_t *offp, boolean_t *case_conflict);
 extern int dmu_snapshot_realname(objset_t *os, char *name, char *real,
     int maxlen, boolean_t *conflict);
 extern int dmu_dir_list_next(objset_t *os, int namelen, char *name,