Print this page
NEX-19394 backport 9337 zfs get all is slow due to uncached metadata
Reviewed by: Joyce McIntosh <joyce.mcintosh@nexenta.com>
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Thomas Caputi <tcaputi@datto.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
Conflicts:
usr/src/uts/common/fs/zfs/dbuf.c
usr/src/uts/common/fs/zfs/dmu.c
usr/src/uts/common/fs/zfs/sys/dmu_objset.h
NEX-3214 remove cos object type from dmu.h
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
NEX-5366 Race between unique_insert() and unique_remove() causes ZFS fsid change
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Dan Vatca <dan.vatca@gmail.com>
NEX-5795 Rename 'wrc' as 'wbc' in the source and in the tech docs
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-5064 On-demand trim should store operation start and stop time
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-4934 Add capability to remove special vdev
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
4185 add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R (fix studio build)
4185 add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Approved by: Garrett D'Amore <garrett@damore.org>
5987 zfs prefetch code needs work
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Gordon Ross <gordon.ross@nexenta.com>
NEX-4582 update wrc test cases for allow to use write back cache per tree of datasets
Reviewed by: Steve Peng <steve.peng@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
5960 zfs recv should prefetch indirect blocks
5925 zfs receive -o origin=
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
5269 zpool import slow
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Dan McDonald <danmcd@omniti.com>
5692 expose the number of hole blocks in a file
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Boris Protopopov <bprotopopov@hotmail.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
NEX-3669 Faults for fans that don't exist
Reviewed by: Jeffry Molanus <jeffry.molanus@nexenta.com>
NEX-3891 Hide the snapshots that belong to in-kernel autosnap-service
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Reviewed by: Alek Pinchuk <alek@nexenta.com>
NEX-3558 KRRP Integration
NEX-3212 remove vdev prop object type from dmu.h
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Josef Sipek <josef.sipek@nexenta.com>
4370 avoid transmitting holes during zfs send
4371 DMU code clean up
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net>
Approved by: Garrett D'Amore <garrett@damore.org>
OS-80 support for vdev and CoS properties for the new I/O scheduler
OS-95 lint warning introduced by OS-61
Issue #40: ZDB shouldn't crash with new code
Support for secondarycache=data option
Align mutex tables in arc.c and dbuf.c to 64 bytes (cache line), place each kmutex_t on cache line by itself to avoid false sharing
Fixup merge results
re #12585 rb4049 ZFS++ work port - refactoring to improve separation of open/closed code, bug fixes, performance improvements - open code
Bug 11205: add missing libzfs_closed_stubs.c to fix opensource-only build.
ZFS plus work: special vdevs, cos, cos/vdev properties
Bug 10481 - Dry run option in 'zfs send' isn't the same as in NexentaStor 3.1
*** 20,30 ****
*/
/*
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
* Copyright (c) 2011, 2017 by Delphix. All rights reserved.
! * Copyright 2011 Nexenta Systems, Inc. All rights reserved.
* Copyright (c) 2012, Joyent, Inc. All rights reserved.
* Copyright 2013 DEY Storage Systems, Inc.
* Copyright 2014 HybridCluster. All rights reserved.
* Copyright (c) 2014 Spectra Logic Corporation, All rights reserved.
* Copyright 2013 Saso Kiselkov. All rights reserved.
--- 20,30 ----
*/
/*
* Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
* Copyright (c) 2011, 2017 by Delphix. All rights reserved.
! * Copyright 2016 Nexenta Systems, Inc. All rights reserved.
* Copyright (c) 2012, Joyent, Inc. All rights reserved.
* Copyright 2013 DEY Storage Systems, Inc.
* Copyright 2014 HybridCluster. All rights reserved.
* Copyright (c) 2014 Spectra Logic Corporation, All rights reserved.
* Copyright 2013 Saso Kiselkov. All rights reserved.
*** 107,117 ****
#define DMU_OT_BYTESWAP_MASK 0x3f
/*
* Defines a uint8_t object type. Object types specify if the data
* in the object is metadata (boolean) and how to byteswap the data
! * (dmu_object_byteswap_t).
*/
#define DMU_OT(byteswap, metadata) \
(DMU_OT_NEWTYPE | \
((metadata) ? DMU_OT_METADATA : 0) | \
((byteswap) & DMU_OT_BYTESWAP_MASK))
--- 107,118 ----
#define DMU_OT_BYTESWAP_MASK 0x3f
/*
* Defines a uint8_t object type. Object types specify if the data
* in the object is metadata (boolean) and how to byteswap the data
! * (dmu_object_byteswap_t). All of the types created by this method
! * are cached in the dbuf metadata cache.
*/
#define DMU_OT(byteswap, metadata) \
(DMU_OT_NEWTYPE | \
((metadata) ? DMU_OT_METADATA : 0) | \
((byteswap) & DMU_OT_BYTESWAP_MASK))
*** 122,131 ****
--- 123,135 ----
#define DMU_OT_IS_METADATA(ot) (((ot) & DMU_OT_NEWTYPE) ? \
((ot) & DMU_OT_METADATA) : \
dmu_ot[(ot)].ot_metadata)
+ #define DMU_OT_IS_METADATA_CACHED(ot) (((ot) & DMU_OT_NEWTYPE) ? \
+ B_TRUE : dmu_ot[(ot)].ot_dbuf_metadata_cache)
+
/*
* These object types use bp_fill != 1 for their L0 bp's. Therefore they can't
* have their data embedded (i.e. use a BP_IS_EMBEDDED() bp), because bp_fill
* is repurposed for embedded BPs.
*/
*** 229,247 ****
DMU_OTN_UINT64_METADATA = DMU_OT(DMU_BSWAP_UINT64, B_TRUE),
DMU_OTN_ZAP_DATA = DMU_OT(DMU_BSWAP_ZAP, B_FALSE),
DMU_OTN_ZAP_METADATA = DMU_OT(DMU_BSWAP_ZAP, B_TRUE),
} dmu_object_type_t;
/*
! * These flags are intended to be used to specify the "txg_how"
! * parameter when calling the dmu_tx_assign() function. See the comment
! * above dmu_tx_assign() for more details on the meaning of these flags.
*/
! #define TXG_NOWAIT (0ULL)
! #define TXG_WAIT (1ULL<<0)
! #define TXG_NOTHROTTLE (1ULL<<1)
void byteswap_uint64_array(void *buf, size_t size);
void byteswap_uint32_array(void *buf, size_t size);
void byteswap_uint16_array(void *buf, size_t size);
void byteswap_uint8_array(void *buf, size_t size);
void zap_byteswap(void *buf, size_t size);
--- 233,262 ----
DMU_OTN_UINT64_METADATA = DMU_OT(DMU_BSWAP_UINT64, B_TRUE),
DMU_OTN_ZAP_DATA = DMU_OT(DMU_BSWAP_ZAP, B_FALSE),
DMU_OTN_ZAP_METADATA = DMU_OT(DMU_BSWAP_ZAP, B_TRUE),
} dmu_object_type_t;
+ typedef enum txg_how {
+ TXG_WAIT = 1,
+ TXG_NOWAIT,
+ TXG_WAITED,
+ } txg_how_t;
+
/*
! * Selected classes of metadata
*/
! #define DMU_OT_IS_DDT_META(type) \
! ((type == DMU_OT_DDT_ZAP) || \
! (type == DMU_OT_DDT_STATS))
+ #define DMU_OT_IS_ZPL_META(type) \
+ ((type == DMU_OT_ZNODE) || \
+ (type == DMU_OT_OLDACL) || \
+ (type == DMU_OT_DIRECTORY_CONTENTS) || \
+ (type == DMU_OT_MASTER_NODE) || \
+ (type == DMU_OT_UNLINKED_SET))
+
void byteswap_uint64_array(void *buf, size_t size);
void byteswap_uint32_array(void *buf, size_t size);
void byteswap_uint16_array(void *buf, size_t size);
void byteswap_uint8_array(void *buf, size_t size);
void zap_byteswap(void *buf, size_t size);
*** 289,299 ****
int dmu_objset_find(char *name, int func(const char *, void *), void *arg,
int flags);
void dmu_objset_byteswap(void *buf, size_t size);
int dsl_dataset_rename_snapshot(const char *fsname,
const char *oldsnapname, const char *newsnapname, boolean_t recursive);
- int dmu_objset_remap_indirects(const char *fsname);
typedef struct dmu_buf {
uint64_t db_object; /* object that this buffer is part of */
uint64_t db_offset; /* byte offset in this object */
uint64_t db_size; /* size of buffer in bytes */
--- 304,313 ----
*** 326,339 ****
#define DMU_POOL_FREE_BPOBJ "free_bpobj"
#define DMU_POOL_BPTREE_OBJ "bptree_obj"
#define DMU_POOL_EMPTY_BPOBJ "empty_bpobj"
#define DMU_POOL_CHECKSUM_SALT "org.illumos:checksum_salt"
#define DMU_POOL_VDEV_ZAP_MAP "com.delphix:vdev_zap_map"
- #define DMU_POOL_REMOVING "com.delphix:removing"
- #define DMU_POOL_OBSOLETE_BPOBJ "com.delphix:obsolete_bpobj"
- #define DMU_POOL_CONDENSING_INDIRECT "com.delphix:condensing_indirect"
/*
* Allocate an object from this objset. The range of object numbers
* available is (0, DN_MAX_OBJECT). Object 0 is the meta-dnode.
*
* The transaction must be assigned to a txg. The newly allocated
--- 340,355 ----
#define DMU_POOL_FREE_BPOBJ "free_bpobj"
#define DMU_POOL_BPTREE_OBJ "bptree_obj"
#define DMU_POOL_EMPTY_BPOBJ "empty_bpobj"
#define DMU_POOL_CHECKSUM_SALT "org.illumos:checksum_salt"
#define DMU_POOL_VDEV_ZAP_MAP "com.delphix:vdev_zap_map"
+ #define DMU_POOL_COS_PROPS "cos_props"
+ #define DMU_POOL_VDEV_PROPS "vdev_props"
+ #define DMU_POOL_TRIM_START_TIME "trim_start_time"
+ #define DMU_POOL_TRIM_STOP_TIME "trim_stop_time"
+
/*
* Allocate an object from this objset. The range of object numbers
* available is (0, DN_MAX_OBJECT). Object 0 is the meta-dnode.
*
* The transaction must be assigned to a txg. The newly allocated
*** 412,423 ****
* apply to all newly written blocks; existing blocks will not be affected.
*/
void dmu_object_set_compress(objset_t *os, uint64_t object, uint8_t compress,
dmu_tx_t *tx);
- int dmu_object_remap_indirects(objset_t *os, uint64_t object, uint64_t txg);
-
void
dmu_write_embedded(objset_t *os, uint64_t object, uint64_t offset,
void *data, uint8_t etype, uint8_t comp, int uncompressed_size,
int compressed_size, int byteorder, dmu_tx_t *tx);
--- 428,437 ----
*** 426,443 ****
*/
#define WP_NOFILL 0x1
#define WP_DMU_SYNC 0x2
#define WP_SPILL 0x4
void dmu_write_policy(objset_t *os, dnode_t *dn, int level, int wp,
struct zio_prop *zp);
/*
* The bonus data is accessed more or less like a regular buffer.
* You must dmu_bonus_hold() to get the buffer, which will give you a
* dmu_buf_t with db_offset==-1ULL, and db_size = the size of the bonus
! * data. As with any normal buffer, you must call dmu_buf_will_dirty()
! * before modifying it, and the
* object must be held in an assigned transaction before calling
* dmu_buf_will_dirty. You may use dmu_buf_set_user() on the bonus
* buffer as well. You must release your hold with dmu_buf_rele().
*
* Returns ENOENT, EIO, or 0.
--- 440,469 ----
*/
#define WP_NOFILL 0x1
#define WP_DMU_SYNC 0x2
#define WP_SPILL 0x4
+ #define WP_SPECIALCLASS_SHIFT (16)
+ #define WP_SPECIALCLASS_BITS (1) /* 1 bits per storage class */
+ #define WP_SPECIALCLASS_MASK (((1 << WP_SPECIALCLASS_BITS) - 1) \
+ << WP_SPECIALCLASS_SHIFT)
+
+ #define WP_SET_SPECIALCLASS(flags, sclass) { \
+ flags |= ((sclass << WP_SPECIALCLASS_SHIFT) & WP_SPECIALCLASS_MASK); \
+ }
+
+ #define WP_GET_SPECIALCLASS(flags) \
+ ((flags & WP_SPECIALCLASS_MASK) >> WP_SPECIALCLASS_SHIFT)
+
void dmu_write_policy(objset_t *os, dnode_t *dn, int level, int wp,
struct zio_prop *zp);
/*
* The bonus data is accessed more or less like a regular buffer.
* You must dmu_bonus_hold() to get the buffer, which will give you a
* dmu_buf_t with db_offset==-1ULL, and db_size = the size of the bonus
! * data. As with any normal buffer, you must call dmu_buf_read() to
! * read db_data, dmu_buf_will_dirty() before modifying it, and the
* object must be held in an assigned transaction before calling
* dmu_buf_will_dirty. You may use dmu_buf_set_user() on the bonus
* buffer as well. You must release your hold with dmu_buf_rele().
*
* Returns ENOENT, EIO, or 0.
*** 650,659 ****
--- 676,686 ----
* The transaction (tx) must be assigned to a txg (ie. you've called
* dmu_tx_assign()). The buffer's object must be held in the tx
* (ie. you've called dmu_tx_hold_object(tx, db->db_object)).
*/
void dmu_buf_will_dirty(dmu_buf_t *db, dmu_tx_t *tx);
+ void dmu_buf_will_dirty_sc(dmu_buf_t *db, dmu_tx_t *tx, boolean_t sc);
/*
* You must create a transaction, then hold the objects which you will
* (or might) modify as part of this transaction. Then you must assign
* the transaction to a transaction group. Once the transaction has
*** 680,700 ****
int len);
void dmu_tx_hold_free(dmu_tx_t *tx, uint64_t object, uint64_t off,
uint64_t len);
void dmu_tx_hold_free_by_dnode(dmu_tx_t *tx, dnode_t *dn, uint64_t off,
uint64_t len);
- void dmu_tx_hold_remap_l1indirect(dmu_tx_t *tx, uint64_t object);
void dmu_tx_hold_zap(dmu_tx_t *tx, uint64_t object, int add, const char *name);
void dmu_tx_hold_zap_by_dnode(dmu_tx_t *tx, dnode_t *dn, int add,
const char *name);
void dmu_tx_hold_bonus(dmu_tx_t *tx, uint64_t object);
void dmu_tx_hold_bonus_by_dnode(dmu_tx_t *tx, dnode_t *dn);
void dmu_tx_hold_spill(dmu_tx_t *tx, uint64_t object);
void dmu_tx_hold_sa(dmu_tx_t *tx, struct sa_handle *hdl, boolean_t may_grow);
void dmu_tx_hold_sa_create(dmu_tx_t *tx, int total_size);
void dmu_tx_abort(dmu_tx_t *tx);
! int dmu_tx_assign(dmu_tx_t *tx, uint64_t txg_how);
void dmu_tx_wait(dmu_tx_t *tx);
void dmu_tx_commit(dmu_tx_t *tx);
void dmu_tx_mark_netfree(dmu_tx_t *tx);
/*
--- 707,726 ----
int len);
void dmu_tx_hold_free(dmu_tx_t *tx, uint64_t object, uint64_t off,
uint64_t len);
void dmu_tx_hold_free_by_dnode(dmu_tx_t *tx, dnode_t *dn, uint64_t off,
uint64_t len);
void dmu_tx_hold_zap(dmu_tx_t *tx, uint64_t object, int add, const char *name);
void dmu_tx_hold_zap_by_dnode(dmu_tx_t *tx, dnode_t *dn, int add,
const char *name);
void dmu_tx_hold_bonus(dmu_tx_t *tx, uint64_t object);
void dmu_tx_hold_bonus_by_dnode(dmu_tx_t *tx, dnode_t *dn);
void dmu_tx_hold_spill(dmu_tx_t *tx, uint64_t object);
void dmu_tx_hold_sa(dmu_tx_t *tx, struct sa_handle *hdl, boolean_t may_grow);
void dmu_tx_hold_sa_create(dmu_tx_t *tx, int total_size);
void dmu_tx_abort(dmu_tx_t *tx);
! int dmu_tx_assign(dmu_tx_t *tx, enum txg_how txg_how);
void dmu_tx_wait(dmu_tx_t *tx);
void dmu_tx_commit(dmu_tx_t *tx);
void dmu_tx_mark_netfree(dmu_tx_t *tx);
/*
*** 796,805 ****
--- 822,832 ----
typedef void arc_byteswap_func_t(void *buf, size_t size);
typedef struct dmu_object_type_info {
dmu_object_byteswap_t ot_byteswap;
boolean_t ot_metadata;
+ boolean_t ot_dbuf_metadata_cache;
char *ot_name;
} dmu_object_type_info_t;
typedef struct dmu_object_byteswap_info {
arc_byteswap_func_t *ob_func;
*** 832,841 ****
--- 859,869 ----
uint64_t dds_num_clones; /* number of clones of this */
uint64_t dds_creation_txg;
uint64_t dds_guid;
dmu_objset_type_t dds_type;
uint8_t dds_is_snapshot;
+ uint8_t dds_is_autosnapshot;
uint8_t dds_inconsistent;
char dds_origin[ZFS_MAX_DATASET_NAME_LEN];
} dmu_objset_stats_t;
/*
*** 885,894 ****
--- 913,924 ----
extern void dmu_objset_name(objset_t *os, char *buf);
extern dmu_objset_type_t dmu_objset_type(objset_t *os);
extern uint64_t dmu_objset_id(objset_t *os);
extern zfs_sync_type_t dmu_objset_syncprop(objset_t *os);
extern zfs_logbias_op_t dmu_objset_logbias(objset_t *os);
+ int dmu_clone_list_next(objset_t *os, int len, char *name,
+ uint64_t *idp, uint64_t *offp);
extern int dmu_snapshot_list_next(objset_t *os, int namelen, char *name,
uint64_t *id, uint64_t *offp, boolean_t *case_conflict);
extern int dmu_snapshot_realname(objset_t *os, char *name, char *real,
int maxlen, boolean_t *conflict);
extern int dmu_dir_list_next(objset_t *os, int namelen, char *name,