Print this page
NEX-13140 DVA-throttle support for special-class
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
NEX-9752 backport illumos 6950 ARC should cache compressed data
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
6950 ARC should cache compressed data
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Don Brady <don.brady@intel.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
NEX-5795 Rename 'wrc' as 'wbc' in the source and in the tech docs
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-5318 Cleanup specialclass property (obsolete, not used) and fix related meta-to-special case
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-5058 WBC: Race between the purging of window and opening new one
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
NEX-2830 ZFS smart compression
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
4185 add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R (fix studio build)
4185 add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Approved by: Garrett D'Amore <garrett@damore.org>
NEX-4582 update wrc test cases for allow to use write back cache per tree of datasets
Reviewed by: Steve Peng <steve.peng@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
5960 zfs recv should prefetch indirect blocks
5925 zfs receive -o origin=
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
NEX-4476 WRC: Allow to use write back cache per tree of datasets
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
Revert "NEX-4476 WRC: Allow to use write back cache per tree of datasets"
This reverts commit fe97b74444278a6f36fec93179133641296312da.
NEX-4476 WRC: Allow to use write back cache per tree of datasets
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
NEX-4044 remove sha1crc32 in preparation with upstream merge of edon-r and skien
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Alek Pinchuk <alek@nexenta.com>
Conflicts:
usr/src/uts/common/fs/zfs/sys/zio_checksum.h
NEX-4028 use lz4 by default
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-4003 WRC: System panics on debug build
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
NEX-3508 CLONE - Port NEX-2946 Add UNMAP/TRIM functionality to ZFS and illumos
Reviewed by: Josef Sipek <josef.sipek@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Conflicts:
usr/src/uts/common/io/scsi/targets/sd.c
usr/src/uts/common/sys/scsi/targets/sddef.h
OS-80 support for vdev and CoS properties for the new I/O scheduler
OS-95 lint warning introduced by OS-61
OS-70 remove zio timer code
Issues #7: Reconsile L2ARC and "special" use by datasets
Support for secondarycache=data option
Align mutex tables in arc.c and dbuf.c to 64 bytes (cache line), place each kmutex_t on cache line by itself to avoid false sharing
re #12643 rb4064 ZFS meta refactoring - vdev utilization tracking, auto-dedup
re #12585 rb4049 ZFS++ work port - refactoring to improve separation of open/closed code, bug fixes, performance improvements - open code
re #12393 rb3935 Kerberos and smbd disagree about who is our AD server (fix elf runtime attributes check)
re #11612 rb3907 Failing vdev of a mirrored pool should not take zfs operations out of action for extended periods of time.
re #8346 rb2639 KT disk failures
Bug 11205: add missing libzfs_closed_stubs.c to fix opensource-only build.
ZFS plus work: special vdevs, cos, cos/vdev properties
@@ -19,11 +19,11 @@
* CDDL HEADER END
*/
/*
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
- * Copyright 2011 Nexenta Systems, Inc. All rights reserved.
+ * Copyright 2016 Nexenta Systems, Inc. All rights reserved.
* Copyright (c) 2012, 2017 by Delphix. All rights reserved.
* Copyright (c) 2013 by Saso Kiselkov. All rights reserved.
* Copyright (c) 2013, Joyent, Inc. All rights reserved.
* Copyright 2016 Toomas Soome <tsoome@me.com>
*/
@@ -42,10 +42,20 @@
#ifdef __cplusplus
extern "C" {
#endif
/*
+ * Checksum state w.r.t. SHA256 acceleration.
+ */
+typedef enum {
+ CKSTATE_NONE = 0,
+ CKSTATE_WAITING,
+ CKSTATE_CHECKSUMMING,
+ CKSTATE_CHECKSUM_DONE
+} zio_checksum_state_t;
+
+/*
* Embedded checksum
*/
#define ZEC_MAGIC 0x210da7ab10c7a11ULL
typedef struct zio_eck {
@@ -136,10 +146,22 @@
#define ZIO_FAILURE_MODE_WAIT 0
#define ZIO_FAILURE_MODE_CONTINUE 1
#define ZIO_FAILURE_MODE_PANIC 2
+/*
+ * Macro for asserting validity of the priorities obtained by conversion
+ * from CoS/vdev properties
+ */
+#define ZIO_PRIORITY_QUEUEABLE_VALID(prio) \
+ (((prio) >= ZIO_PRIORITY_SYNC_READ) && \
+ ((prio) < ZIO_PRIORITY_NUM_QUEUEABLE))
+
+#define ZIO_PIPELINE_CONTINUE 0x100
+#define ZIO_PIPELINE_STOP 0x101
+#define ZIO_PIPELINE_RESTART_STAGE 0x102
+
enum zio_flag {
/*
* Flags inherited by gang, ddt, and vdev children,
* and that must be equal for two zios to aggregate
*/
@@ -204,31 +226,20 @@
(((zio)->io_flags & ZIO_FLAG_GANG_INHERIT) | \
ZIO_FLAG_GANG_CHILD | ZIO_FLAG_CANFAIL)
#define ZIO_VDEV_CHILD_FLAGS(zio) \
(((zio)->io_flags & ZIO_FLAG_VDEV_INHERIT) | \
- ZIO_FLAG_DONT_PROPAGATE | ZIO_FLAG_CANFAIL)
+ ZIO_FLAG_CANFAIL)
-#define ZIO_CHILD_BIT(x) (1 << (x))
-#define ZIO_CHILD_BIT_IS_SET(val, x) ((val) & (1 << (x)))
-
enum zio_child {
ZIO_CHILD_VDEV = 0,
ZIO_CHILD_GANG,
ZIO_CHILD_DDT,
ZIO_CHILD_LOGICAL,
ZIO_CHILD_TYPES
};
-#define ZIO_CHILD_VDEV_BIT ZIO_CHILD_BIT(ZIO_CHILD_VDEV)
-#define ZIO_CHILD_GANG_BIT ZIO_CHILD_BIT(ZIO_CHILD_GANG)
-#define ZIO_CHILD_DDT_BIT ZIO_CHILD_BIT(ZIO_CHILD_DDT)
-#define ZIO_CHILD_LOGICAL_BIT ZIO_CHILD_BIT(ZIO_CHILD_LOGICAL)
-#define ZIO_CHILD_ALL_BITS \
- (ZIO_CHILD_VDEV_BIT | ZIO_CHILD_GANG_BIT | \
- ZIO_CHILD_DDT_BIT | ZIO_CHILD_LOGICAL_BIT)
-
enum zio_wait_type {
ZIO_WAIT_READY = 0,
ZIO_WAIT_DONE,
ZIO_WAIT_TYPES
};
@@ -243,10 +254,12 @@
typedef void zio_done_func_t(zio_t *zio);
extern boolean_t zio_dva_throttle_enabled;
extern const char *zio_type_name[ZIO_TYPES];
+struct range_tree;
+
/*
* A bookmark is a four-tuple <objset, object, level, blkid> that uniquely
* identifies any block in the pool. By convention, the meta-objset (MOS)
* is objset 0, and the meta-dnode is object 0. This covers all blocks
* except root blocks and ZIL blocks, which are defined as follows:
@@ -302,13 +315,17 @@
enum zio_checksum zp_checksum;
enum zio_compress zp_compress;
dmu_object_type_t zp_type;
uint8_t zp_level;
uint8_t zp_copies;
- boolean_t zp_dedup;
- boolean_t zp_dedup_verify;
+ uint8_t zp_dedup;
+ uint8_t zp_dedup_verify;
boolean_t zp_nopwrite;
+ boolean_t zp_metadata;
+ boolean_t zp_usesc;
+ boolean_t zp_usewbc;
+ uint64_t zp_zpl_meta_to_special;
} zio_prop_t;
typedef struct zio_cksum_report zio_cksum_report_t;
typedef void zio_cksum_finish_f(zio_cksum_report_t *rep,
@@ -384,10 +401,34 @@
zio_t *zl_child;
list_node_t zl_parent_node;
list_node_t zl_child_node;
} zio_link_t;
+/*
+ * When smart compression is enabled, this callback info structure is
+ * passed to write zio's to monitor per-object compression performance.
+ *
+ * When zio_write determines that the `compression' setting for the dataset
+ * is not `off', if `sc_ask' is not NULL, it will call the `sc_ask' callback
+ * function, asking the upper layers whether it should really try to compress
+ * the object in question. If the function returns B_TRUE, compression is
+ * attempted. Once compression is done, sc_result is called to inform the
+ * upper layers of the compression result. By comparing the zio's io_size to
+ * io_orig_size it can monitor compression performance on the particular
+ * object in question (if io_size == io_orig_size, then compression failed).
+ * It is not legal to pass a NULL sc_ask but non-NULL sc_result to zio_write.
+ */
+typedef struct zio_smartcomp_info {
+ boolean_t (*sc_ask)(void *userinfo, const zio_t *);
+ void (*sc_result)(void *userinfo, const zio_t *);
+ void *sc_userinfo;
+} zio_smartcomp_info_t;
+
+#define ZIO_SHOULD_COMPRESS(zio) \
+ ((zio)->io_smartcomp.sc_ask == NULL || \
+ (zio)->io_smartcomp.sc_ask((zio)->io_smartcomp.sc_userinfo, (zio)))
+
struct zio {
/* Core information about this I/O */
zbookmark_phys_t io_bookmark;
zio_prop_t io_prop;
zio_type_t io_type;
@@ -403,10 +444,11 @@
blkptr_t io_bp_copy;
list_t io_parent_list;
list_t io_child_list;
zio_t *io_logical;
zio_transform_t *io_transform_stack;
+ zio_smartcomp_info_t io_smartcomp;
/* Callback info */
zio_done_func_t *io_ready;
zio_done_func_t *io_children_ready;
zio_done_func_t *io_physdone;
@@ -463,10 +505,24 @@
zio_cksum_report_t *io_cksum_report;
uint64_t io_ena;
/* Taskq dispatching state */
taskq_ent_t io_tqent;
+
+ /* Timestamp for tracking vdev I/O latency */
+ hrtime_t io_vd_timestamp;
+
+ /* Checksum acceleration */
+ zio_checksum_state_t zio_checksum_state;
+ zio_cksum_t *zio_checksump;
+ void *zio_checksum_datap;
+ uint64_t zio_checksum_data_size;
+ struct zio *zio_checksum_next;
+ zio_cksum_t actual_cksum;
+
+ /* Metaslab class that will be used */
+ metaslab_class_t *io_mc;
};
extern int zio_bookmark_compare(const void *, const void *);
extern zio_t *zio_null(zio_t *pio, spa_t *spa, vdev_t *vd,
@@ -482,11 +538,12 @@
extern zio_t *zio_write(zio_t *pio, spa_t *spa, uint64_t txg, blkptr_t *bp,
struct abd *data, uint64_t size, uint64_t psize, const zio_prop_t *zp,
zio_done_func_t *ready, zio_done_func_t *children_ready,
zio_done_func_t *physdone, zio_done_func_t *done,
void *private, zio_priority_t priority, enum zio_flag flags,
- const zbookmark_phys_t *zb);
+ const zbookmark_phys_t *zb,
+ const zio_smartcomp_info_t *smartcomp);
extern zio_t *zio_rewrite(zio_t *pio, spa_t *spa, uint64_t txg, blkptr_t *bp,
struct abd *data, uint64_t size, zio_done_func_t *done, void *private,
zio_priority_t priority, enum zio_flag flags, zbookmark_phys_t *zb);
@@ -500,10 +557,14 @@
zio_done_func_t *done, void *private, enum zio_flag flags);
extern zio_t *zio_ioctl(zio_t *pio, spa_t *spa, vdev_t *vd, int cmd,
zio_done_func_t *done, void *private, enum zio_flag flags);
+extern zio_t *zio_trim(spa_t *spa, vdev_t *vd, struct range_tree *tree,
+ zio_done_func_t *done, void *private, enum zio_flag flags,
+ int dkiocfree_flags, metaslab_t *msp);
+
extern zio_t *zio_read_phys(zio_t *pio, vdev_t *vd, uint64_t offset,
uint64_t size, struct abd *data, int checksum,
zio_done_func_t *done, void *private, zio_priority_t priority,
enum zio_flag flags, boolean_t labels);
@@ -510,10 +571,13 @@
extern zio_t *zio_write_phys(zio_t *pio, vdev_t *vd, uint64_t offset,
uint64_t size, struct abd *data, int checksum,
zio_done_func_t *done, void *private, zio_priority_t priority,
enum zio_flag flags, boolean_t labels);
+extern zio_t *zio_wbc(zio_type_t type, vdev_t *vd, abd_t *data,
+ uint64_t size, uint64_t offset);
+
extern zio_t *zio_free_sync(zio_t *pio, spa_t *spa, uint64_t txg,
const blkptr_t *bp, enum zio_flag flags);
extern int zio_alloc_zil(spa_t *spa, uint64_t txg, blkptr_t *new_bp,
blkptr_t *old_bp, uint64_t size, boolean_t *slog);
@@ -602,11 +666,10 @@
extern void zfs_ereport_finish_checksum(zio_cksum_report_t *report,
const void *good_data, const void *bad_data, boolean_t drop_if_identical);
extern void zfs_ereport_send_interim_checksum(zio_cksum_report_t *report);
extern void zfs_ereport_free_checksum(zio_cksum_report_t *report);
-
/* If we have the good data in hand, this function can be used */
extern void zfs_ereport_post_checksum(spa_t *spa, vdev_t *vd,
struct zio *zio, uint64_t offset, uint64_t length,
const void *good_data, const void *bad_data, struct zio_bad_cksum *info);
@@ -617,10 +680,13 @@
boolean_t zbookmark_subtree_completed(const struct dnode_phys *dnp,
const zbookmark_phys_t *subtree_root, const zbookmark_phys_t *last_block);
int zbookmark_compare(uint16_t dbss1, uint8_t ibs1, uint16_t dbss2,
uint8_t ibs2, const zbookmark_phys_t *zb1, const zbookmark_phys_t *zb2);
+/* best effort dedup */
+void zio_best_effort_dedup(zio_t *zio);
+
#ifdef __cplusplus
}
#endif
#endif /* _ZIO_H */