Print this page
NEX-19394 backport 9337 zfs get all is slow due to uncached metadata
Reviewed by: Joyce McIntosh <joyce.mcintosh@nexenta.com>
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Thomas Caputi <tcaputi@datto.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
 Conflicts:
  usr/src/uts/common/fs/zfs/dbuf.c
  usr/src/uts/common/fs/zfs/dmu.c
  usr/src/uts/common/fs/zfs/sys/dmu_objset.h
NEX-3214 remove cos object type from dmu.h
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
NEX-5366 Race between unique_insert() and unique_remove() causes ZFS fsid change
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Dan Vatca <dan.vatca@gmail.com>
NEX-5795 Rename 'wrc' as 'wbc' in the source and in the tech docs
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-5064 On-demand trim should store operation start and stop time
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-4934 Add capability to remove special vdev
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
4185 add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R (fix studio build)
4185 add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Approved by: Garrett D'Amore <garrett@damore.org>
5987 zfs prefetch code needs work
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Gordon Ross <gordon.ross@nexenta.com>
NEX-4582 update wrc test cases for allow to use write back cache per tree of datasets
Reviewed by: Steve Peng <steve.peng@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
5960 zfs recv should prefetch indirect blocks
5925 zfs receive -o origin=
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
5269 zpool import slow
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Dan McDonald <danmcd@omniti.com>
5692 expose the number of hole blocks in a file
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Boris Protopopov <bprotopopov@hotmail.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
NEX-3669 Faults for fans that don't exist
Reviewed by: Jeffry Molanus <jeffry.molanus@nexenta.com>
NEX-3891 Hide the snapshots that belong to in-kernel autosnap-service
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Reviewed by: Alek Pinchuk <alek@nexenta.com>
NEX-3558 KRRP Integration
NEX-3212 remove vdev prop object type from dmu.h
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Josef Sipek <josef.sipek@nexenta.com>
4370 avoid transmitting holes during zfs send
4371 DMU code clean up
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net>
Approved by: Garrett D'Amore <garrett@damore.org>
OS-80 support for vdev and CoS properties for the new I/O scheduler
OS-95 lint warning introduced by OS-61
Issue #40: ZDB shouldn't crash with new code
Support for secondarycache=data option
Align mutex tables in arc.c and dbuf.c to 64 bytes (cache line), place each kmutex_t on cache line by itself to avoid false sharing
Fixup merge results
re #12585 rb4049 ZFS++ work port - refactoring to improve separation of open/closed code, bug fixes, performance improvements - open code
Bug 11205: add missing libzfs_closed_stubs.c to fix opensource-only build.
ZFS plus work: special vdevs, cos, cos/vdev properties
Bug 10481 - Dry run option in 'zfs send' isn't the same as in NexentaStor 3.1
        
*** 20,30 ****
   */
  
  /*
   * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
   * Copyright (c) 2011, 2017 by Delphix. All rights reserved.
!  * Copyright 2011 Nexenta Systems, Inc. All rights reserved.
   * Copyright (c) 2012, Joyent, Inc. All rights reserved.
   * Copyright 2013 DEY Storage Systems, Inc.
   * Copyright 2014 HybridCluster. All rights reserved.
   * Copyright (c) 2014 Spectra Logic Corporation, All rights reserved.
   * Copyright 2013 Saso Kiselkov. All rights reserved.
--- 20,30 ----
   */
  
  /*
   * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
   * Copyright (c) 2011, 2017 by Delphix. All rights reserved.
!  * Copyright 2016 Nexenta Systems, Inc. All rights reserved.
   * Copyright (c) 2012, Joyent, Inc. All rights reserved.
   * Copyright 2013 DEY Storage Systems, Inc.
   * Copyright 2014 HybridCluster. All rights reserved.
   * Copyright (c) 2014 Spectra Logic Corporation, All rights reserved.
   * Copyright 2013 Saso Kiselkov. All rights reserved.
*** 107,117 ****
  #define DMU_OT_BYTESWAP_MASK 0x3f
  
  /*
   * Defines a uint8_t object type. Object types specify if the data
   * in the object is metadata (boolean) and how to byteswap the data
!  * (dmu_object_byteswap_t).
   */
  #define DMU_OT(byteswap, metadata) \
          (DMU_OT_NEWTYPE | \
          ((metadata) ? DMU_OT_METADATA : 0) | \
          ((byteswap) & DMU_OT_BYTESWAP_MASK))
--- 107,118 ----
  #define DMU_OT_BYTESWAP_MASK 0x3f
  
  /*
   * Defines a uint8_t object type. Object types specify if the data
   * in the object is metadata (boolean) and how to byteswap the data
!  * (dmu_object_byteswap_t). All of the types created by this method
!  * are cached in the dbuf metadata cache.
   */
  #define DMU_OT(byteswap, metadata) \
          (DMU_OT_NEWTYPE | \
          ((metadata) ? DMU_OT_METADATA : 0) | \
          ((byteswap) & DMU_OT_BYTESWAP_MASK))
*** 122,131 ****
--- 123,135 ----
  
  #define DMU_OT_IS_METADATA(ot) (((ot) & DMU_OT_NEWTYPE) ? \
          ((ot) & DMU_OT_METADATA) : \
          dmu_ot[(ot)].ot_metadata)
  
+ #define DMU_OT_IS_METADATA_CACHED(ot) (((ot) & DMU_OT_NEWTYPE) ? \
+         B_TRUE : dmu_ot[(ot)].ot_dbuf_metadata_cache)
+ 
  /*
   * These object types use bp_fill != 1 for their L0 bp's. Therefore they can't
   * have their data embedded (i.e. use a BP_IS_EMBEDDED() bp), because bp_fill
   * is repurposed for embedded BPs.
   */
*** 229,247 ****
          DMU_OTN_UINT64_METADATA = DMU_OT(DMU_BSWAP_UINT64, B_TRUE),
          DMU_OTN_ZAP_DATA = DMU_OT(DMU_BSWAP_ZAP, B_FALSE),
          DMU_OTN_ZAP_METADATA = DMU_OT(DMU_BSWAP_ZAP, B_TRUE),
  } dmu_object_type_t;
  
  /*
!  * These flags are intended to be used to specify the "txg_how"
!  * parameter when calling the dmu_tx_assign() function. See the comment
!  * above dmu_tx_assign() for more details on the meaning of these flags.
   */
! #define TXG_NOWAIT      (0ULL)
! #define TXG_WAIT        (1ULL<<0)
! #define TXG_NOTHROTTLE  (1ULL<<1)
  
  void byteswap_uint64_array(void *buf, size_t size);
  void byteswap_uint32_array(void *buf, size_t size);
  void byteswap_uint16_array(void *buf, size_t size);
  void byteswap_uint8_array(void *buf, size_t size);
  void zap_byteswap(void *buf, size_t size);
--- 233,262 ----
          DMU_OTN_UINT64_METADATA = DMU_OT(DMU_BSWAP_UINT64, B_TRUE),
          DMU_OTN_ZAP_DATA = DMU_OT(DMU_BSWAP_ZAP, B_FALSE),
          DMU_OTN_ZAP_METADATA = DMU_OT(DMU_BSWAP_ZAP, B_TRUE),
  } dmu_object_type_t;
  
+ typedef enum txg_how {
+         TXG_WAIT = 1,
+         TXG_NOWAIT,
+         TXG_WAITED,
+ } txg_how_t;
+ 
  /*
!  * Selected classes of metadata
   */
! #define DMU_OT_IS_DDT_META(type)        \
!         ((type == DMU_OT_DDT_ZAP) ||    \
!         (type == DMU_OT_DDT_STATS))
  
+ #define DMU_OT_IS_ZPL_META(type)                \
+         ((type == DMU_OT_ZNODE) ||              \
+         (type == DMU_OT_OLDACL) ||              \
+         (type == DMU_OT_DIRECTORY_CONTENTS) ||  \
+         (type == DMU_OT_MASTER_NODE) ||         \
+         (type == DMU_OT_UNLINKED_SET))
+ 
  void byteswap_uint64_array(void *buf, size_t size);
  void byteswap_uint32_array(void *buf, size_t size);
  void byteswap_uint16_array(void *buf, size_t size);
  void byteswap_uint8_array(void *buf, size_t size);
  void zap_byteswap(void *buf, size_t size);
*** 289,299 ****
  int dmu_objset_find(char *name, int func(const char *, void *), void *arg,
      int flags);
  void dmu_objset_byteswap(void *buf, size_t size);
  int dsl_dataset_rename_snapshot(const char *fsname,
      const char *oldsnapname, const char *newsnapname, boolean_t recursive);
- int dmu_objset_remap_indirects(const char *fsname);
  
  typedef struct dmu_buf {
          uint64_t db_object;             /* object that this buffer is part of */
          uint64_t db_offset;             /* byte offset in this object */
          uint64_t db_size;               /* size of buffer in bytes */
--- 304,313 ----
*** 326,339 ****
  #define DMU_POOL_FREE_BPOBJ             "free_bpobj"
  #define DMU_POOL_BPTREE_OBJ             "bptree_obj"
  #define DMU_POOL_EMPTY_BPOBJ            "empty_bpobj"
  #define DMU_POOL_CHECKSUM_SALT          "org.illumos:checksum_salt"
  #define DMU_POOL_VDEV_ZAP_MAP           "com.delphix:vdev_zap_map"
- #define DMU_POOL_REMOVING               "com.delphix:removing"
- #define DMU_POOL_OBSOLETE_BPOBJ         "com.delphix:obsolete_bpobj"
- #define DMU_POOL_CONDENSING_INDIRECT    "com.delphix:condensing_indirect"
  
  /*
   * Allocate an object from this objset.  The range of object numbers
   * available is (0, DN_MAX_OBJECT).  Object 0 is the meta-dnode.
   *
   * The transaction must be assigned to a txg.  The newly allocated
--- 340,355 ----
  #define DMU_POOL_FREE_BPOBJ             "free_bpobj"
  #define DMU_POOL_BPTREE_OBJ             "bptree_obj"
  #define DMU_POOL_EMPTY_BPOBJ            "empty_bpobj"
  #define DMU_POOL_CHECKSUM_SALT          "org.illumos:checksum_salt"
  #define DMU_POOL_VDEV_ZAP_MAP           "com.delphix:vdev_zap_map"
  
+ #define DMU_POOL_COS_PROPS              "cos_props"
+ #define DMU_POOL_VDEV_PROPS             "vdev_props"
+ #define DMU_POOL_TRIM_START_TIME        "trim_start_time"
+ #define DMU_POOL_TRIM_STOP_TIME         "trim_stop_time"
+ 
  /*
   * Allocate an object from this objset.  The range of object numbers
   * available is (0, DN_MAX_OBJECT).  Object 0 is the meta-dnode.
   *
   * The transaction must be assigned to a txg.  The newly allocated
*** 412,423 ****
   * apply to all newly written blocks; existing blocks will not be affected.
   */
  void dmu_object_set_compress(objset_t *os, uint64_t object, uint8_t compress,
      dmu_tx_t *tx);
  
- int dmu_object_remap_indirects(objset_t *os, uint64_t object, uint64_t txg);
- 
  void
  dmu_write_embedded(objset_t *os, uint64_t object, uint64_t offset,
      void *data, uint8_t etype, uint8_t comp, int uncompressed_size,
      int compressed_size, int byteorder, dmu_tx_t *tx);
  
--- 428,437 ----
*** 426,443 ****
   */
  #define WP_NOFILL       0x1
  #define WP_DMU_SYNC     0x2
  #define WP_SPILL        0x4
  
  void dmu_write_policy(objset_t *os, dnode_t *dn, int level, int wp,
      struct zio_prop *zp);
  /*
   * The bonus data is accessed more or less like a regular buffer.
   * You must dmu_bonus_hold() to get the buffer, which will give you a
   * dmu_buf_t with db_offset==-1ULL, and db_size = the size of the bonus
!  * data.  As with any normal buffer, you must call dmu_buf_will_dirty()
!  * before modifying it, and the
   * object must be held in an assigned transaction before calling
   * dmu_buf_will_dirty.  You may use dmu_buf_set_user() on the bonus
   * buffer as well.  You must release your hold with dmu_buf_rele().
   *
   * Returns ENOENT, EIO, or 0.
--- 440,469 ----
   */
  #define WP_NOFILL       0x1
  #define WP_DMU_SYNC     0x2
  #define WP_SPILL        0x4
  
+ #define WP_SPECIALCLASS_SHIFT   (16)
+ #define WP_SPECIALCLASS_BITS    (1) /* 1 bits per storage class */
+ #define WP_SPECIALCLASS_MASK    (((1 << WP_SPECIALCLASS_BITS) - 1) \
+         << WP_SPECIALCLASS_SHIFT)
+ 
+ #define WP_SET_SPECIALCLASS(flags, sclass)      { \
+         flags |= ((sclass << WP_SPECIALCLASS_SHIFT) & WP_SPECIALCLASS_MASK); \
+ }
+ 
+ #define WP_GET_SPECIALCLASS(flags) \
+         ((flags & WP_SPECIALCLASS_MASK) >> WP_SPECIALCLASS_SHIFT)
+ 
  void dmu_write_policy(objset_t *os, dnode_t *dn, int level, int wp,
      struct zio_prop *zp);
  /*
   * The bonus data is accessed more or less like a regular buffer.
   * You must dmu_bonus_hold() to get the buffer, which will give you a
   * dmu_buf_t with db_offset==-1ULL, and db_size = the size of the bonus
!  * data.  As with any normal buffer, you must call dmu_buf_read() to
!  * read db_data, dmu_buf_will_dirty() before modifying it, and the
   * object must be held in an assigned transaction before calling
   * dmu_buf_will_dirty.  You may use dmu_buf_set_user() on the bonus
   * buffer as well.  You must release your hold with dmu_buf_rele().
   *
   * Returns ENOENT, EIO, or 0.
*** 650,659 ****
--- 676,686 ----
   * The transaction (tx) must be assigned to a txg (ie. you've called
   * dmu_tx_assign()).  The buffer's object must be held in the tx
   * (ie. you've called dmu_tx_hold_object(tx, db->db_object)).
   */
  void dmu_buf_will_dirty(dmu_buf_t *db, dmu_tx_t *tx);
+ void dmu_buf_will_dirty_sc(dmu_buf_t *db, dmu_tx_t *tx, boolean_t sc);
  
  /*
   * You must create a transaction, then hold the objects which you will
   * (or might) modify as part of this transaction.  Then you must assign
   * the transaction to a transaction group.  Once the transaction has
*** 680,700 ****
      int len);
  void dmu_tx_hold_free(dmu_tx_t *tx, uint64_t object, uint64_t off,
      uint64_t len);
  void dmu_tx_hold_free_by_dnode(dmu_tx_t *tx, dnode_t *dn, uint64_t off,
      uint64_t len);
- void dmu_tx_hold_remap_l1indirect(dmu_tx_t *tx, uint64_t object);
  void dmu_tx_hold_zap(dmu_tx_t *tx, uint64_t object, int add, const char *name);
  void dmu_tx_hold_zap_by_dnode(dmu_tx_t *tx, dnode_t *dn, int add,
      const char *name);
  void dmu_tx_hold_bonus(dmu_tx_t *tx, uint64_t object);
  void dmu_tx_hold_bonus_by_dnode(dmu_tx_t *tx, dnode_t *dn);
  void dmu_tx_hold_spill(dmu_tx_t *tx, uint64_t object);
  void dmu_tx_hold_sa(dmu_tx_t *tx, struct sa_handle *hdl, boolean_t may_grow);
  void dmu_tx_hold_sa_create(dmu_tx_t *tx, int total_size);
  void dmu_tx_abort(dmu_tx_t *tx);
! int dmu_tx_assign(dmu_tx_t *tx, uint64_t txg_how);
  void dmu_tx_wait(dmu_tx_t *tx);
  void dmu_tx_commit(dmu_tx_t *tx);
  void dmu_tx_mark_netfree(dmu_tx_t *tx);
  
  /*
--- 707,726 ----
      int len);
  void dmu_tx_hold_free(dmu_tx_t *tx, uint64_t object, uint64_t off,
      uint64_t len);
  void dmu_tx_hold_free_by_dnode(dmu_tx_t *tx, dnode_t *dn, uint64_t off,
      uint64_t len);
  void dmu_tx_hold_zap(dmu_tx_t *tx, uint64_t object, int add, const char *name);
  void dmu_tx_hold_zap_by_dnode(dmu_tx_t *tx, dnode_t *dn, int add,
      const char *name);
  void dmu_tx_hold_bonus(dmu_tx_t *tx, uint64_t object);
  void dmu_tx_hold_bonus_by_dnode(dmu_tx_t *tx, dnode_t *dn);
  void dmu_tx_hold_spill(dmu_tx_t *tx, uint64_t object);
  void dmu_tx_hold_sa(dmu_tx_t *tx, struct sa_handle *hdl, boolean_t may_grow);
  void dmu_tx_hold_sa_create(dmu_tx_t *tx, int total_size);
  void dmu_tx_abort(dmu_tx_t *tx);
! int dmu_tx_assign(dmu_tx_t *tx, enum txg_how txg_how);
  void dmu_tx_wait(dmu_tx_t *tx);
  void dmu_tx_commit(dmu_tx_t *tx);
  void dmu_tx_mark_netfree(dmu_tx_t *tx);
  
  /*
*** 796,805 ****
--- 822,832 ----
  typedef void arc_byteswap_func_t(void *buf, size_t size);
  
  typedef struct dmu_object_type_info {
          dmu_object_byteswap_t   ot_byteswap;
          boolean_t               ot_metadata;
+         boolean_t               ot_dbuf_metadata_cache;
          char                    *ot_name;
  } dmu_object_type_info_t;
  
  typedef struct dmu_object_byteswap_info {
          arc_byteswap_func_t     *ob_func;
*** 832,841 ****
--- 859,869 ----
          uint64_t dds_num_clones; /* number of clones of this */
          uint64_t dds_creation_txg;
          uint64_t dds_guid;
          dmu_objset_type_t dds_type;
          uint8_t dds_is_snapshot;
+         uint8_t dds_is_autosnapshot;
          uint8_t dds_inconsistent;
          char dds_origin[ZFS_MAX_DATASET_NAME_LEN];
  } dmu_objset_stats_t;
  
  /*
*** 885,894 ****
--- 913,924 ----
  extern void dmu_objset_name(objset_t *os, char *buf);
  extern dmu_objset_type_t dmu_objset_type(objset_t *os);
  extern uint64_t dmu_objset_id(objset_t *os);
  extern zfs_sync_type_t dmu_objset_syncprop(objset_t *os);
  extern zfs_logbias_op_t dmu_objset_logbias(objset_t *os);
+ int dmu_clone_list_next(objset_t *os, int len, char *name,
+     uint64_t *idp, uint64_t *offp);
  extern int dmu_snapshot_list_next(objset_t *os, int namelen, char *name,
      uint64_t *id, uint64_t *offp, boolean_t *case_conflict);
  extern int dmu_snapshot_realname(objset_t *os, char *name, char *real,
      int maxlen, boolean_t *conflict);
  extern int dmu_dir_list_next(objset_t *os, int namelen, char *name,