Print this page
NEX-13140 DVA-throttle support for special-class
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
NEX-9752 backport illumos 6950 ARC should cache compressed data
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
6950 ARC should cache compressed data
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Don Brady <don.brady@intel.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
NEX-5795 Rename 'wrc' as 'wbc' in the source and in the tech docs
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-5318 Cleanup specialclass property (obsolete, not used) and fix related meta-to-special case
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-5058 WBC: Race between the purging of window and opening new one
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
NEX-2830 ZFS smart compression
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
4185 add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R (fix studio build)
4185 add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Approved by: Garrett D'Amore <garrett@damore.org>
NEX-4582 update wrc test cases for allow to use write back cache per tree of datasets
Reviewed by: Steve Peng <steve.peng@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
5960 zfs recv should prefetch indirect blocks
5925 zfs receive -o origin=
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
NEX-4476 WRC: Allow to use write back cache per tree of datasets
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
Revert "NEX-4476 WRC: Allow to use write back cache per tree of datasets"
This reverts commit fe97b74444278a6f36fec93179133641296312da.
NEX-4476 WRC: Allow to use write back cache per tree of datasets
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
NEX-4044 remove sha1crc32 in preparation with upstream merge of edon-r and skien
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Alek Pinchuk <alek@nexenta.com>
Conflicts:
        usr/src/uts/common/fs/zfs/sys/zio_checksum.h
NEX-4028 use lz4 by default
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-4003 WRC: System panics on debug build
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
NEX-3508 CLONE - Port NEX-2946 Add UNMAP/TRIM functionality to ZFS and illumos
Reviewed by: Josef Sipek <josef.sipek@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Conflicts:
    usr/src/uts/common/io/scsi/targets/sd.c
    usr/src/uts/common/sys/scsi/targets/sddef.h
OS-80 support for vdev and CoS properties for the new I/O scheduler
OS-95 lint warning introduced by OS-61
OS-70 remove zio timer code
Issues #7: Reconsile L2ARC and "special" use by datasets
Support for secondarycache=data option
Align mutex tables in arc.c and dbuf.c to 64 bytes (cache line), place each kmutex_t on cache line by itself to avoid false sharing
re #12643 rb4064 ZFS meta refactoring - vdev utilization tracking, auto-dedup
re #12585 rb4049 ZFS++ work port - refactoring to improve separation of open/closed code, bug fixes, performance improvements - open code
re #12393 rb3935 Kerberos and smbd disagree about who is our AD server (fix elf runtime attributes check)
re #11612 rb3907 Failing vdev of a mirrored pool should not take zfs operations out of action for extended periods of time.
re #8346 rb2639 KT disk failures
Bug 11205: add missing libzfs_closed_stubs.c to fix opensource-only build.
ZFS plus work: special vdevs, cos, cos/vdev properties
        
*** 19,29 ****
   * CDDL HEADER END
   */
  
  /*
   * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
!  * Copyright 2011 Nexenta Systems, Inc.  All rights reserved.
   * Copyright (c) 2012, 2017 by Delphix. All rights reserved.
   * Copyright (c) 2013 by Saso Kiselkov. All rights reserved.
   * Copyright (c) 2013, Joyent, Inc. All rights reserved.
   * Copyright 2016 Toomas Soome <tsoome@me.com>
   */
--- 19,29 ----
   * CDDL HEADER END
   */
  
  /*
   * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
!  * Copyright 2016 Nexenta Systems, Inc.  All rights reserved.
   * Copyright (c) 2012, 2017 by Delphix. All rights reserved.
   * Copyright (c) 2013 by Saso Kiselkov. All rights reserved.
   * Copyright (c) 2013, Joyent, Inc. All rights reserved.
   * Copyright 2016 Toomas Soome <tsoome@me.com>
   */
*** 42,51 ****
--- 42,61 ----
  #ifdef  __cplusplus
  extern "C" {
  #endif
  
  /*
+  * Checksum state w.r.t. SHA256 acceleration.
+  */
+ typedef enum {
+         CKSTATE_NONE = 0,
+         CKSTATE_WAITING,
+         CKSTATE_CHECKSUMMING,
+         CKSTATE_CHECKSUM_DONE
+ } zio_checksum_state_t;
+ 
+ /*
   * Embedded checksum
   */
  #define ZEC_MAGIC       0x210da7ab10c7a11ULL
  
  typedef struct zio_eck {
*** 136,145 ****
--- 146,167 ----
  
  #define ZIO_FAILURE_MODE_WAIT           0
  #define ZIO_FAILURE_MODE_CONTINUE       1
  #define ZIO_FAILURE_MODE_PANIC          2
  
+ /*
+  * Macro for asserting validity of the priorities obtained by conversion
+  * from CoS/vdev properties
+  */
+ #define ZIO_PRIORITY_QUEUEABLE_VALID(prio)      \
+         (((prio) >= ZIO_PRIORITY_SYNC_READ) &&  \
+         ((prio) < ZIO_PRIORITY_NUM_QUEUEABLE))
+ 
+ #define ZIO_PIPELINE_CONTINUE           0x100
+ #define ZIO_PIPELINE_STOP               0x101
+ #define ZIO_PIPELINE_RESTART_STAGE      0x102
+ 
  enum zio_flag {
          /*
           * Flags inherited by gang, ddt, and vdev children,
           * and that must be equal for two zios to aggregate
           */
*** 204,234 ****
          (((zio)->io_flags & ZIO_FLAG_GANG_INHERIT) |            \
          ZIO_FLAG_GANG_CHILD | ZIO_FLAG_CANFAIL)
  
  #define ZIO_VDEV_CHILD_FLAGS(zio)                               \
          (((zio)->io_flags & ZIO_FLAG_VDEV_INHERIT) |            \
!         ZIO_FLAG_DONT_PROPAGATE | ZIO_FLAG_CANFAIL)
  
- #define ZIO_CHILD_BIT(x)                (1 << (x))
- #define ZIO_CHILD_BIT_IS_SET(val, x)    ((val) & (1 << (x)))
- 
  enum zio_child {
          ZIO_CHILD_VDEV = 0,
          ZIO_CHILD_GANG,
          ZIO_CHILD_DDT,
          ZIO_CHILD_LOGICAL,
          ZIO_CHILD_TYPES
  };
  
- #define ZIO_CHILD_VDEV_BIT              ZIO_CHILD_BIT(ZIO_CHILD_VDEV)
- #define ZIO_CHILD_GANG_BIT              ZIO_CHILD_BIT(ZIO_CHILD_GANG)
- #define ZIO_CHILD_DDT_BIT               ZIO_CHILD_BIT(ZIO_CHILD_DDT)
- #define ZIO_CHILD_LOGICAL_BIT           ZIO_CHILD_BIT(ZIO_CHILD_LOGICAL)
- #define ZIO_CHILD_ALL_BITS                                      \
-         (ZIO_CHILD_VDEV_BIT | ZIO_CHILD_GANG_BIT |              \
-         ZIO_CHILD_DDT_BIT | ZIO_CHILD_LOGICAL_BIT)
- 
  enum zio_wait_type {
          ZIO_WAIT_READY = 0,
          ZIO_WAIT_DONE,
          ZIO_WAIT_TYPES
  };
--- 226,245 ----
          (((zio)->io_flags & ZIO_FLAG_GANG_INHERIT) |            \
          ZIO_FLAG_GANG_CHILD | ZIO_FLAG_CANFAIL)
  
  #define ZIO_VDEV_CHILD_FLAGS(zio)                               \
          (((zio)->io_flags & ZIO_FLAG_VDEV_INHERIT) |            \
!         ZIO_FLAG_CANFAIL)
  
  enum zio_child {
          ZIO_CHILD_VDEV = 0,
          ZIO_CHILD_GANG,
          ZIO_CHILD_DDT,
          ZIO_CHILD_LOGICAL,
          ZIO_CHILD_TYPES
  };
  
  enum zio_wait_type {
          ZIO_WAIT_READY = 0,
          ZIO_WAIT_DONE,
          ZIO_WAIT_TYPES
  };
*** 243,252 ****
--- 254,265 ----
  typedef void zio_done_func_t(zio_t *zio);
  
  extern boolean_t zio_dva_throttle_enabled;
  extern const char *zio_type_name[ZIO_TYPES];
  
+ struct range_tree;
+ 
  /*
   * A bookmark is a four-tuple <objset, object, level, blkid> that uniquely
   * identifies any block in the pool.  By convention, the meta-objset (MOS)
   * is objset 0, and the meta-dnode is object 0.  This covers all blocks
   * except root blocks and ZIL blocks, which are defined as follows:
*** 302,314 ****
          enum zio_checksum       zp_checksum;
          enum zio_compress       zp_compress;
          dmu_object_type_t       zp_type;
          uint8_t                 zp_level;
          uint8_t                 zp_copies;
!         boolean_t               zp_dedup;
!         boolean_t               zp_dedup_verify;
          boolean_t               zp_nopwrite;
  } zio_prop_t;
  
  typedef struct zio_cksum_report zio_cksum_report_t;
  
  typedef void zio_cksum_finish_f(zio_cksum_report_t *rep,
--- 315,331 ----
          enum zio_checksum       zp_checksum;
          enum zio_compress       zp_compress;
          dmu_object_type_t       zp_type;
          uint8_t                 zp_level;
          uint8_t                 zp_copies;
!         uint8_t                 zp_dedup;
!         uint8_t                 zp_dedup_verify;
          boolean_t               zp_nopwrite;
+         boolean_t               zp_metadata;
+         boolean_t               zp_usesc;
+         boolean_t               zp_usewbc;
+         uint64_t                zp_zpl_meta_to_special;
  } zio_prop_t;
  
  typedef struct zio_cksum_report zio_cksum_report_t;
  
  typedef void zio_cksum_finish_f(zio_cksum_report_t *rep,
*** 384,393 ****
--- 401,434 ----
          zio_t           *zl_child;
          list_node_t     zl_parent_node;
          list_node_t     zl_child_node;
  } zio_link_t;
  
+ /*
+  * When smart compression is enabled, this callback info structure is
+  * passed to write zio's to monitor per-object compression performance.
+  *
+  * When zio_write determines that the `compression' setting for the dataset
+  * is not `off', if `sc_ask' is not NULL, it will call the `sc_ask' callback
+  * function, asking the upper layers whether it should really try to compress
+  * the object in question. If the function returns B_TRUE, compression is
+  * attempted. Once compression is done, sc_result is called to inform the
+  * upper layers of the compression result. By comparing the zio's io_size to
+  * io_orig_size it can monitor compression performance on the particular
+  * object in question (if io_size == io_orig_size, then compression failed).
+  * It is not legal to pass a NULL sc_ask but non-NULL sc_result to zio_write.
+  */
+ typedef struct zio_smartcomp_info {
+         boolean_t (*sc_ask)(void *userinfo, const zio_t *);
+         void (*sc_result)(void *userinfo, const zio_t *);
+         void *sc_userinfo;
+ } zio_smartcomp_info_t;
+ 
+ #define ZIO_SHOULD_COMPRESS(zio) \
+         ((zio)->io_smartcomp.sc_ask == NULL || \
+         (zio)->io_smartcomp.sc_ask((zio)->io_smartcomp.sc_userinfo, (zio)))
+ 
  struct zio {
          /* Core information about this I/O */
          zbookmark_phys_t        io_bookmark;
          zio_prop_t      io_prop;
          zio_type_t      io_type;
*** 403,412 ****
--- 444,454 ----
          blkptr_t        io_bp_copy;
          list_t          io_parent_list;
          list_t          io_child_list;
          zio_t           *io_logical;
          zio_transform_t *io_transform_stack;
+         zio_smartcomp_info_t    io_smartcomp;
  
          /* Callback info */
          zio_done_func_t *io_ready;
          zio_done_func_t *io_children_ready;
          zio_done_func_t *io_physdone;
*** 463,472 ****
--- 505,528 ----
          zio_cksum_report_t *io_cksum_report;
          uint64_t        io_ena;
  
          /* Taskq dispatching state */
          taskq_ent_t     io_tqent;
+ 
+         /* Timestamp for tracking vdev I/O latency */
+         hrtime_t io_vd_timestamp;
+ 
+         /* Checksum acceleration */
+         zio_checksum_state_t    zio_checksum_state;
+         zio_cksum_t             *zio_checksump;
+         void                    *zio_checksum_datap;
+         uint64_t                zio_checksum_data_size;
+         struct zio              *zio_checksum_next;
+         zio_cksum_t             actual_cksum;
+ 
+         /* Metaslab class that will be used */
+         metaslab_class_t *io_mc;
  };
  
  extern int zio_bookmark_compare(const void *, const void *);
  
  extern zio_t *zio_null(zio_t *pio, spa_t *spa, vdev_t *vd,
*** 482,492 ****
  extern zio_t *zio_write(zio_t *pio, spa_t *spa, uint64_t txg, blkptr_t *bp,
      struct abd *data, uint64_t size, uint64_t psize, const zio_prop_t *zp,
      zio_done_func_t *ready, zio_done_func_t *children_ready,
      zio_done_func_t *physdone, zio_done_func_t *done,
      void *private, zio_priority_t priority, enum zio_flag flags,
!     const zbookmark_phys_t *zb);
  
  extern zio_t *zio_rewrite(zio_t *pio, spa_t *spa, uint64_t txg, blkptr_t *bp,
      struct abd *data, uint64_t size, zio_done_func_t *done, void *private,
      zio_priority_t priority, enum zio_flag flags, zbookmark_phys_t *zb);
  
--- 538,549 ----
  extern zio_t *zio_write(zio_t *pio, spa_t *spa, uint64_t txg, blkptr_t *bp,
      struct abd *data, uint64_t size, uint64_t psize, const zio_prop_t *zp,
      zio_done_func_t *ready, zio_done_func_t *children_ready,
      zio_done_func_t *physdone, zio_done_func_t *done,
      void *private, zio_priority_t priority, enum zio_flag flags,
!     const zbookmark_phys_t *zb,
!     const zio_smartcomp_info_t *smartcomp);
  
  extern zio_t *zio_rewrite(zio_t *pio, spa_t *spa, uint64_t txg, blkptr_t *bp,
      struct abd *data, uint64_t size, zio_done_func_t *done, void *private,
      zio_priority_t priority, enum zio_flag flags, zbookmark_phys_t *zb);
  
*** 500,509 ****
--- 557,570 ----
      zio_done_func_t *done, void *private, enum zio_flag flags);
  
  extern zio_t *zio_ioctl(zio_t *pio, spa_t *spa, vdev_t *vd, int cmd,
      zio_done_func_t *done, void *private, enum zio_flag flags);
  
+ extern zio_t *zio_trim(spa_t *spa, vdev_t *vd, struct range_tree *tree,
+     zio_done_func_t *done, void *private, enum zio_flag flags,
+     int dkiocfree_flags, metaslab_t *msp);
+ 
  extern zio_t *zio_read_phys(zio_t *pio, vdev_t *vd, uint64_t offset,
      uint64_t size, struct abd *data, int checksum,
      zio_done_func_t *done, void *private, zio_priority_t priority,
      enum zio_flag flags, boolean_t labels);
  
*** 510,519 ****
--- 571,583 ----
  extern zio_t *zio_write_phys(zio_t *pio, vdev_t *vd, uint64_t offset,
      uint64_t size, struct abd *data, int checksum,
      zio_done_func_t *done, void *private, zio_priority_t priority,
      enum zio_flag flags, boolean_t labels);
  
+ extern zio_t *zio_wbc(zio_type_t type, vdev_t *vd, abd_t *data,
+     uint64_t size, uint64_t offset);
+ 
  extern zio_t *zio_free_sync(zio_t *pio, spa_t *spa, uint64_t txg,
      const blkptr_t *bp, enum zio_flag flags);
  
  extern int zio_alloc_zil(spa_t *spa, uint64_t txg, blkptr_t *new_bp,
      blkptr_t *old_bp, uint64_t size, boolean_t *slog);
*** 602,612 ****
  extern void zfs_ereport_finish_checksum(zio_cksum_report_t *report,
      const void *good_data, const void *bad_data, boolean_t drop_if_identical);
  
  extern void zfs_ereport_send_interim_checksum(zio_cksum_report_t *report);
  extern void zfs_ereport_free_checksum(zio_cksum_report_t *report);
- 
  /* If we have the good data in hand, this function can be used */
  extern void zfs_ereport_post_checksum(spa_t *spa, vdev_t *vd,
      struct zio *zio, uint64_t offset, uint64_t length,
      const void *good_data, const void *bad_data, struct zio_bad_cksum *info);
  
--- 666,675 ----
*** 617,626 ****
--- 680,692 ----
  boolean_t zbookmark_subtree_completed(const struct dnode_phys *dnp,
      const zbookmark_phys_t *subtree_root, const zbookmark_phys_t *last_block);
  int zbookmark_compare(uint16_t dbss1, uint8_t ibs1, uint16_t dbss2,
      uint8_t ibs2, const zbookmark_phys_t *zb1, const zbookmark_phys_t *zb2);
  
+ /* best effort dedup */
+ void zio_best_effort_dedup(zio_t *zio);
+ 
  #ifdef  __cplusplus
  }
  #endif
  
  #endif  /* _ZIO_H */