Print this page
NEX-13140 DVA-throttle support for special-class
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
NEX-9752 backport illumos 6950 ARC should cache compressed data
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
6950 ARC should cache compressed data
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Don Brady <don.brady@intel.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
NEX-5795 Rename 'wrc' as 'wbc' in the source and in the tech docs
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-5318 Cleanup specialclass property (obsolete, not used) and fix related meta-to-special case
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-5058 WBC: Race between the purging of window and opening new one
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
NEX-2830 ZFS smart compression
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
4185 add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R (fix studio build)
4185 add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Approved by: Garrett D'Amore <garrett@damore.org>
NEX-4582 update wrc test cases for allow to use write back cache per tree of datasets
Reviewed by: Steve Peng <steve.peng@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
5960 zfs recv should prefetch indirect blocks
5925 zfs receive -o origin=
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
NEX-4476 WRC: Allow to use write back cache per tree of datasets
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
Revert "NEX-4476 WRC: Allow to use write back cache per tree of datasets"
This reverts commit fe97b74444278a6f36fec93179133641296312da.
NEX-4476 WRC: Allow to use write back cache per tree of datasets
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
NEX-4044 remove sha1crc32 in preparation with upstream merge of edon-r and skien
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Alek Pinchuk <alek@nexenta.com>
Conflicts:
        usr/src/uts/common/fs/zfs/sys/zio_checksum.h
NEX-4028 use lz4 by default
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-4003 WRC: System panics on debug build
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
NEX-3508 CLONE - Port NEX-2946 Add UNMAP/TRIM functionality to ZFS and illumos
Reviewed by: Josef Sipek <josef.sipek@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Conflicts:
    usr/src/uts/common/io/scsi/targets/sd.c
    usr/src/uts/common/sys/scsi/targets/sddef.h
OS-80 support for vdev and CoS properties for the new I/O scheduler
OS-95 lint warning introduced by OS-61
OS-70 remove zio timer code
Issues #7: Reconsile L2ARC and "special" use by datasets
Support for secondarycache=data option
Align mutex tables in arc.c and dbuf.c to 64 bytes (cache line), place each kmutex_t on cache line by itself to avoid false sharing
re #12643 rb4064 ZFS meta refactoring - vdev utilization tracking, auto-dedup
re #12585 rb4049 ZFS++ work port - refactoring to improve separation of open/closed code, bug fixes, performance improvements - open code
re #12393 rb3935 Kerberos and smbd disagree about who is our AD server (fix elf runtime attributes check)
re #11612 rb3907 Failing vdev of a mirrored pool should not take zfs operations out of action for extended periods of time.
re #8346 rb2639 KT disk failures
Bug 11205: add missing libzfs_closed_stubs.c to fix opensource-only build.
ZFS plus work: special vdevs, cos, cos/vdev properties

@@ -19,11 +19,11 @@
  * CDDL HEADER END
  */
 
 /*
  * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
- * Copyright 2011 Nexenta Systems, Inc.  All rights reserved.
+ * Copyright 2016 Nexenta Systems, Inc.  All rights reserved.
  * Copyright (c) 2012, 2017 by Delphix. All rights reserved.
  * Copyright (c) 2013 by Saso Kiselkov. All rights reserved.
  * Copyright (c) 2013, Joyent, Inc. All rights reserved.
  * Copyright 2016 Toomas Soome <tsoome@me.com>
  */

@@ -42,10 +42,20 @@
 #ifdef  __cplusplus
 extern "C" {
 #endif
 
 /*
+ * Checksum state w.r.t. SHA256 acceleration.
+ */
+typedef enum {
+        CKSTATE_NONE = 0,
+        CKSTATE_WAITING,
+        CKSTATE_CHECKSUMMING,
+        CKSTATE_CHECKSUM_DONE
+} zio_checksum_state_t;
+
+/*
  * Embedded checksum
  */
 #define ZEC_MAGIC       0x210da7ab10c7a11ULL
 
 typedef struct zio_eck {

@@ -136,10 +146,22 @@
 
 #define ZIO_FAILURE_MODE_WAIT           0
 #define ZIO_FAILURE_MODE_CONTINUE       1
 #define ZIO_FAILURE_MODE_PANIC          2
 
+/*
+ * Macro for asserting validity of the priorities obtained by conversion
+ * from CoS/vdev properties
+ */
+#define ZIO_PRIORITY_QUEUEABLE_VALID(prio)      \
+        (((prio) >= ZIO_PRIORITY_SYNC_READ) &&  \
+        ((prio) < ZIO_PRIORITY_NUM_QUEUEABLE))
+
+#define ZIO_PIPELINE_CONTINUE           0x100
+#define ZIO_PIPELINE_STOP               0x101
+#define ZIO_PIPELINE_RESTART_STAGE      0x102
+
 enum zio_flag {
         /*
          * Flags inherited by gang, ddt, and vdev children,
          * and that must be equal for two zios to aggregate
          */

@@ -204,31 +226,20 @@
         (((zio)->io_flags & ZIO_FLAG_GANG_INHERIT) |            \
         ZIO_FLAG_GANG_CHILD | ZIO_FLAG_CANFAIL)
 
 #define ZIO_VDEV_CHILD_FLAGS(zio)                               \
         (((zio)->io_flags & ZIO_FLAG_VDEV_INHERIT) |            \
-        ZIO_FLAG_DONT_PROPAGATE | ZIO_FLAG_CANFAIL)
+        ZIO_FLAG_CANFAIL)
 
-#define ZIO_CHILD_BIT(x)                (1 << (x))
-#define ZIO_CHILD_BIT_IS_SET(val, x)    ((val) & (1 << (x)))
-
 enum zio_child {
         ZIO_CHILD_VDEV = 0,
         ZIO_CHILD_GANG,
         ZIO_CHILD_DDT,
         ZIO_CHILD_LOGICAL,
         ZIO_CHILD_TYPES
 };
 
-#define ZIO_CHILD_VDEV_BIT              ZIO_CHILD_BIT(ZIO_CHILD_VDEV)
-#define ZIO_CHILD_GANG_BIT              ZIO_CHILD_BIT(ZIO_CHILD_GANG)
-#define ZIO_CHILD_DDT_BIT               ZIO_CHILD_BIT(ZIO_CHILD_DDT)
-#define ZIO_CHILD_LOGICAL_BIT           ZIO_CHILD_BIT(ZIO_CHILD_LOGICAL)
-#define ZIO_CHILD_ALL_BITS                                      \
-        (ZIO_CHILD_VDEV_BIT | ZIO_CHILD_GANG_BIT |              \
-        ZIO_CHILD_DDT_BIT | ZIO_CHILD_LOGICAL_BIT)
-
 enum zio_wait_type {
         ZIO_WAIT_READY = 0,
         ZIO_WAIT_DONE,
         ZIO_WAIT_TYPES
 };

@@ -243,10 +254,12 @@
 typedef void zio_done_func_t(zio_t *zio);
 
 extern boolean_t zio_dva_throttle_enabled;
 extern const char *zio_type_name[ZIO_TYPES];
 
+struct range_tree;
+
 /*
  * A bookmark is a four-tuple <objset, object, level, blkid> that uniquely
  * identifies any block in the pool.  By convention, the meta-objset (MOS)
  * is objset 0, and the meta-dnode is object 0.  This covers all blocks
  * except root blocks and ZIL blocks, which are defined as follows:

@@ -302,13 +315,17 @@
         enum zio_checksum       zp_checksum;
         enum zio_compress       zp_compress;
         dmu_object_type_t       zp_type;
         uint8_t                 zp_level;
         uint8_t                 zp_copies;
-        boolean_t               zp_dedup;
-        boolean_t               zp_dedup_verify;
+        uint8_t                 zp_dedup;
+        uint8_t                 zp_dedup_verify;
         boolean_t               zp_nopwrite;
+        boolean_t               zp_metadata;
+        boolean_t               zp_usesc;
+        boolean_t               zp_usewbc;
+        uint64_t                zp_zpl_meta_to_special;
 } zio_prop_t;
 
 typedef struct zio_cksum_report zio_cksum_report_t;
 
 typedef void zio_cksum_finish_f(zio_cksum_report_t *rep,

@@ -384,10 +401,34 @@
         zio_t           *zl_child;
         list_node_t     zl_parent_node;
         list_node_t     zl_child_node;
 } zio_link_t;
 
+/*
+ * When smart compression is enabled, this callback info structure is
+ * passed to write zio's to monitor per-object compression performance.
+ *
+ * When zio_write determines that the `compression' setting for the dataset
+ * is not `off', if `sc_ask' is not NULL, it will call the `sc_ask' callback
+ * function, asking the upper layers whether it should really try to compress
+ * the object in question. If the function returns B_TRUE, compression is
+ * attempted. Once compression is done, sc_result is called to inform the
+ * upper layers of the compression result. By comparing the zio's io_size to
+ * io_orig_size it can monitor compression performance on the particular
+ * object in question (if io_size == io_orig_size, then compression failed).
+ * It is not legal to pass a NULL sc_ask but non-NULL sc_result to zio_write.
+ */
+typedef struct zio_smartcomp_info {
+        boolean_t (*sc_ask)(void *userinfo, const zio_t *);
+        void (*sc_result)(void *userinfo, const zio_t *);
+        void *sc_userinfo;
+} zio_smartcomp_info_t;
+
+#define ZIO_SHOULD_COMPRESS(zio) \
+        ((zio)->io_smartcomp.sc_ask == NULL || \
+        (zio)->io_smartcomp.sc_ask((zio)->io_smartcomp.sc_userinfo, (zio)))
+
 struct zio {
         /* Core information about this I/O */
         zbookmark_phys_t        io_bookmark;
         zio_prop_t      io_prop;
         zio_type_t      io_type;

@@ -403,10 +444,11 @@
         blkptr_t        io_bp_copy;
         list_t          io_parent_list;
         list_t          io_child_list;
         zio_t           *io_logical;
         zio_transform_t *io_transform_stack;
+        zio_smartcomp_info_t    io_smartcomp;
 
         /* Callback info */
         zio_done_func_t *io_ready;
         zio_done_func_t *io_children_ready;
         zio_done_func_t *io_physdone;

@@ -463,10 +505,24 @@
         zio_cksum_report_t *io_cksum_report;
         uint64_t        io_ena;
 
         /* Taskq dispatching state */
         taskq_ent_t     io_tqent;
+
+        /* Timestamp for tracking vdev I/O latency */
+        hrtime_t io_vd_timestamp;
+
+        /* Checksum acceleration */
+        zio_checksum_state_t    zio_checksum_state;
+        zio_cksum_t             *zio_checksump;
+        void                    *zio_checksum_datap;
+        uint64_t                zio_checksum_data_size;
+        struct zio              *zio_checksum_next;
+        zio_cksum_t             actual_cksum;
+
+        /* Metaslab class that will be used */
+        metaslab_class_t *io_mc;
 };
 
 extern int zio_bookmark_compare(const void *, const void *);
 
 extern zio_t *zio_null(zio_t *pio, spa_t *spa, vdev_t *vd,

@@ -482,11 +538,12 @@
 extern zio_t *zio_write(zio_t *pio, spa_t *spa, uint64_t txg, blkptr_t *bp,
     struct abd *data, uint64_t size, uint64_t psize, const zio_prop_t *zp,
     zio_done_func_t *ready, zio_done_func_t *children_ready,
     zio_done_func_t *physdone, zio_done_func_t *done,
     void *private, zio_priority_t priority, enum zio_flag flags,
-    const zbookmark_phys_t *zb);
+    const zbookmark_phys_t *zb,
+    const zio_smartcomp_info_t *smartcomp);
 
 extern zio_t *zio_rewrite(zio_t *pio, spa_t *spa, uint64_t txg, blkptr_t *bp,
     struct abd *data, uint64_t size, zio_done_func_t *done, void *private,
     zio_priority_t priority, enum zio_flag flags, zbookmark_phys_t *zb);
 

@@ -500,10 +557,14 @@
     zio_done_func_t *done, void *private, enum zio_flag flags);
 
 extern zio_t *zio_ioctl(zio_t *pio, spa_t *spa, vdev_t *vd, int cmd,
     zio_done_func_t *done, void *private, enum zio_flag flags);
 
+extern zio_t *zio_trim(spa_t *spa, vdev_t *vd, struct range_tree *tree,
+    zio_done_func_t *done, void *private, enum zio_flag flags,
+    int dkiocfree_flags, metaslab_t *msp);
+
 extern zio_t *zio_read_phys(zio_t *pio, vdev_t *vd, uint64_t offset,
     uint64_t size, struct abd *data, int checksum,
     zio_done_func_t *done, void *private, zio_priority_t priority,
     enum zio_flag flags, boolean_t labels);
 

@@ -510,10 +571,13 @@
 extern zio_t *zio_write_phys(zio_t *pio, vdev_t *vd, uint64_t offset,
     uint64_t size, struct abd *data, int checksum,
     zio_done_func_t *done, void *private, zio_priority_t priority,
     enum zio_flag flags, boolean_t labels);
 
+extern zio_t *zio_wbc(zio_type_t type, vdev_t *vd, abd_t *data,
+    uint64_t size, uint64_t offset);
+
 extern zio_t *zio_free_sync(zio_t *pio, spa_t *spa, uint64_t txg,
     const blkptr_t *bp, enum zio_flag flags);
 
 extern int zio_alloc_zil(spa_t *spa, uint64_t txg, blkptr_t *new_bp,
     blkptr_t *old_bp, uint64_t size, boolean_t *slog);

@@ -602,11 +666,10 @@
 extern void zfs_ereport_finish_checksum(zio_cksum_report_t *report,
     const void *good_data, const void *bad_data, boolean_t drop_if_identical);
 
 extern void zfs_ereport_send_interim_checksum(zio_cksum_report_t *report);
 extern void zfs_ereport_free_checksum(zio_cksum_report_t *report);
-
 /* If we have the good data in hand, this function can be used */
 extern void zfs_ereport_post_checksum(spa_t *spa, vdev_t *vd,
     struct zio *zio, uint64_t offset, uint64_t length,
     const void *good_data, const void *bad_data, struct zio_bad_cksum *info);
 

@@ -617,10 +680,13 @@
 boolean_t zbookmark_subtree_completed(const struct dnode_phys *dnp,
     const zbookmark_phys_t *subtree_root, const zbookmark_phys_t *last_block);
 int zbookmark_compare(uint16_t dbss1, uint8_t ibs1, uint16_t dbss2,
     uint8_t ibs2, const zbookmark_phys_t *zb1, const zbookmark_phys_t *zb2);
 
+/* best effort dedup */
+void zio_best_effort_dedup(zio_t *zio);
+
 #ifdef  __cplusplus
 }
 #endif
 
 #endif  /* _ZIO_H */