Print this page
NEX-19394 backport 9337 zfs get all is slow due to uncached metadata
Reviewed by: Joyce McIntosh <joyce.mcintosh@nexenta.com>
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Thomas Caputi <tcaputi@datto.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
 Conflicts:
  usr/src/uts/common/fs/zfs/dbuf.c
  usr/src/uts/common/fs/zfs/dmu.c
  usr/src/uts/common/fs/zfs/sys/dmu_objset.h
NEX-3214 remove cos object type from dmu.h
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
NEX-5366 Race between unique_insert() and unique_remove() causes ZFS fsid change
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Dan Vatca <dan.vatca@gmail.com>
NEX-5795 Rename 'wrc' as 'wbc' in the source and in the tech docs
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-5064 On-demand trim should store operation start and stop time
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-4934 Add capability to remove special vdev
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
4185 add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R (fix studio build)
4185 add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Approved by: Garrett D'Amore <garrett@damore.org>
5987 zfs prefetch code needs work
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Gordon Ross <gordon.ross@nexenta.com>
NEX-4582 update wrc test cases for allow to use write back cache per tree of datasets
Reviewed by: Steve Peng <steve.peng@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
5960 zfs recv should prefetch indirect blocks
5925 zfs receive -o origin=
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
5269 zpool import slow
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Dan McDonald <danmcd@omniti.com>
5692 expose the number of hole blocks in a file
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Boris Protopopov <bprotopopov@hotmail.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
NEX-3669 Faults for fans that don't exist
Reviewed by: Jeffry Molanus <jeffry.molanus@nexenta.com>
NEX-3891 Hide the snapshots that belong to in-kernel autosnap-service
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Reviewed by: Alek Pinchuk <alek@nexenta.com>
NEX-3558 KRRP Integration
NEX-3212 remove vdev prop object type from dmu.h
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Josef Sipek <josef.sipek@nexenta.com>
4370 avoid transmitting holes during zfs send
4371 DMU code clean up
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net>
Approved by: Garrett D'Amore <garrett@damore.org>
OS-80 support for vdev and CoS properties for the new I/O scheduler
OS-95 lint warning introduced by OS-61
Issue #40: ZDB shouldn't crash with new code
Support for secondarycache=data option
Align mutex tables in arc.c and dbuf.c to 64 bytes (cache line), place each kmutex_t on cache line by itself to avoid false sharing
Fixup merge results
re #12585 rb4049 ZFS++ work port - refactoring to improve separation of open/closed code, bug fixes, performance improvements - open code
Bug 11205: add missing libzfs_closed_stubs.c to fix opensource-only build.
ZFS plus work: special vdevs, cos, cos/vdev properties
Bug 10481 - Dry run option in 'zfs send' isn't the same as in NexentaStor 3.1


   5  * Common Development and Distribution License (the "License").
   6  * You may not use this file except in compliance with the License.
   7  *
   8  * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
   9  * or http://www.opensolaris.org/os/licensing.
  10  * See the License for the specific language governing permissions
  11  * and limitations under the License.
  12  *
  13  * When distributing Covered Code, include this CDDL HEADER in each
  14  * file and include the License file at usr/src/OPENSOLARIS.LICENSE.
  15  * If applicable, add the following below this CDDL HEADER, with the
  16  * fields enclosed by brackets "[]" replaced with your own identifying
  17  * information: Portions Copyright [yyyy] [name of copyright owner]
  18  *
  19  * CDDL HEADER END
  20  */
  21 
  22 /*
  23  * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
  24  * Copyright (c) 2011, 2017 by Delphix. All rights reserved.
  25  * Copyright 2011 Nexenta Systems, Inc. All rights reserved.
  26  * Copyright (c) 2012, Joyent, Inc. All rights reserved.
  27  * Copyright 2013 DEY Storage Systems, Inc.
  28  * Copyright 2014 HybridCluster. All rights reserved.
  29  * Copyright (c) 2014 Spectra Logic Corporation, All rights reserved.
  30  * Copyright 2013 Saso Kiselkov. All rights reserved.
  31  * Copyright (c) 2014 Integros [integros.com]
  32  */
  33 
  34 /* Portions Copyright 2010 Robert Milkowski */
  35 
  36 #ifndef _SYS_DMU_H
  37 #define _SYS_DMU_H
  38 
  39 /*
  40  * This file describes the interface that the DMU provides for its
  41  * consumers.
  42  *
  43  * The DMU also interacts with the SPA.  That interface is described in
  44  * dmu_spa.h.
  45  */


  92         DMU_BSWAP_ZNODE,
  93         DMU_BSWAP_OLDACL,
  94         DMU_BSWAP_ACL,
  95         /*
  96          * Allocating a new byteswap type number makes the on-disk format
  97          * incompatible with any other format that uses the same number.
  98          *
  99          * Data can usually be structured to work with one of the
 100          * DMU_BSWAP_UINT* or DMU_BSWAP_ZAP types.
 101          */
 102         DMU_BSWAP_NUMFUNCS
 103 } dmu_object_byteswap_t;
 104 
 105 #define DMU_OT_NEWTYPE 0x80
 106 #define DMU_OT_METADATA 0x40
 107 #define DMU_OT_BYTESWAP_MASK 0x3f
 108 
 109 /*
 110  * Defines a uint8_t object type. Object types specify if the data
 111  * in the object is metadata (boolean) and how to byteswap the data
 112  * (dmu_object_byteswap_t).

 113  */
 114 #define DMU_OT(byteswap, metadata) \
 115         (DMU_OT_NEWTYPE | \
 116         ((metadata) ? DMU_OT_METADATA : 0) | \
 117         ((byteswap) & DMU_OT_BYTESWAP_MASK))
 118 
 119 #define DMU_OT_IS_VALID(ot) (((ot) & DMU_OT_NEWTYPE) ? \
 120         ((ot) & DMU_OT_BYTESWAP_MASK) < DMU_BSWAP_NUMFUNCS : \
 121         (ot) < DMU_OT_NUMTYPES)
 122 
 123 #define DMU_OT_IS_METADATA(ot) (((ot) & DMU_OT_NEWTYPE) ? \
 124         ((ot) & DMU_OT_METADATA) : \
 125         dmu_ot[(ot)].ot_metadata)
 126 



 127 /*
 128  * These object types use bp_fill != 1 for their L0 bp's. Therefore they can't
 129  * have their data embedded (i.e. use a BP_IS_EMBEDDED() bp), because bp_fill
 130  * is repurposed for embedded BPs.
 131  */
 132 #define DMU_OT_HAS_FILL(ot) \
 133         ((ot) == DMU_OT_DNODE || (ot) == DMU_OT_OBJSET)
 134 
 135 #define DMU_OT_BYTESWAP(ot) (((ot) & DMU_OT_NEWTYPE) ? \
 136         ((ot) & DMU_OT_BYTESWAP_MASK) : \
 137         dmu_ot[(ot)].ot_byteswap)
 138 
 139 typedef enum dmu_object_type {
 140         DMU_OT_NONE,
 141         /* general: */
 142         DMU_OT_OBJECT_DIRECTORY,        /* ZAP */
 143         DMU_OT_OBJECT_ARRAY,            /* UINT64 */
 144         DMU_OT_PACKED_NVLIST,           /* UINT8 (XDR by nvlist_pack/unpack) */
 145         DMU_OT_PACKED_NVLIST_SIZE,      /* UINT64 */
 146         DMU_OT_BPOBJ,                   /* UINT64 */


 214          * of indexing into dmu_ot directly (this works for both DMU_OT_* types
 215          * and DMU_OTN_* types).
 216          */
 217         DMU_OT_NUMTYPES,
 218 
 219         /*
 220          * Names for valid types declared with DMU_OT().
 221          */
 222         DMU_OTN_UINT8_DATA = DMU_OT(DMU_BSWAP_UINT8, B_FALSE),
 223         DMU_OTN_UINT8_METADATA = DMU_OT(DMU_BSWAP_UINT8, B_TRUE),
 224         DMU_OTN_UINT16_DATA = DMU_OT(DMU_BSWAP_UINT16, B_FALSE),
 225         DMU_OTN_UINT16_METADATA = DMU_OT(DMU_BSWAP_UINT16, B_TRUE),
 226         DMU_OTN_UINT32_DATA = DMU_OT(DMU_BSWAP_UINT32, B_FALSE),
 227         DMU_OTN_UINT32_METADATA = DMU_OT(DMU_BSWAP_UINT32, B_TRUE),
 228         DMU_OTN_UINT64_DATA = DMU_OT(DMU_BSWAP_UINT64, B_FALSE),
 229         DMU_OTN_UINT64_METADATA = DMU_OT(DMU_BSWAP_UINT64, B_TRUE),
 230         DMU_OTN_ZAP_DATA = DMU_OT(DMU_BSWAP_ZAP, B_FALSE),
 231         DMU_OTN_ZAP_METADATA = DMU_OT(DMU_BSWAP_ZAP, B_TRUE),
 232 } dmu_object_type_t;
 233 






 234 /*
 235  * These flags are intended to be used to specify the "txg_how"
 236  * parameter when calling the dmu_tx_assign() function. See the comment
 237  * above dmu_tx_assign() for more details on the meaning of these flags.
 238  */
 239 #define TXG_NOWAIT      (0ULL)
 240 #define TXG_WAIT        (1ULL<<0)
 241 #define TXG_NOTHROTTLE  (1ULL<<1)
 242 







 243 void byteswap_uint64_array(void *buf, size_t size);
 244 void byteswap_uint32_array(void *buf, size_t size);
 245 void byteswap_uint16_array(void *buf, size_t size);
 246 void byteswap_uint8_array(void *buf, size_t size);
 247 void zap_byteswap(void *buf, size_t size);
 248 void zfs_oldacl_byteswap(void *buf, size_t size);
 249 void zfs_acl_byteswap(void *buf, size_t size);
 250 void zfs_znode_byteswap(void *buf, size_t size);
 251 
 252 #define DS_FIND_SNAPSHOTS       (1<<0)
 253 #define DS_FIND_CHILDREN        (1<<1)
 254 #define DS_FIND_SERIALIZE       (1<<2)
 255 
 256 /*
 257  * The maximum number of bytes that can be accessed as part of one
 258  * operation, including metadata.
 259  */
 260 #define DMU_MAX_ACCESS (32 * 1024 * 1024) /* 32MB */
 261 #define DMU_MAX_DELETEBLKCNT (20480) /* ~5MB of indirect blocks */
 262 


 274 int dmu_objset_hold(const char *name, void *tag, objset_t **osp);
 275 int dmu_objset_own(const char *name, dmu_objset_type_t type,
 276     boolean_t readonly, void *tag, objset_t **osp);
 277 void dmu_objset_rele(objset_t *os, void *tag);
 278 void dmu_objset_disown(objset_t *os, void *tag);
 279 int dmu_objset_open_ds(struct dsl_dataset *ds, objset_t **osp);
 280 
 281 void dmu_objset_evict_dbufs(objset_t *os);
 282 int dmu_objset_create(const char *name, dmu_objset_type_t type, uint64_t flags,
 283     void (*func)(objset_t *os, void *arg, cred_t *cr, dmu_tx_t *tx), void *arg);
 284 int dmu_objset_clone(const char *name, const char *origin);
 285 int dsl_destroy_snapshots_nvl(struct nvlist *snaps, boolean_t defer,
 286     struct nvlist *errlist);
 287 int dmu_objset_snapshot_one(const char *fsname, const char *snapname);
 288 int dmu_objset_snapshot_tmp(const char *, const char *, int);
 289 int dmu_objset_find(char *name, int func(const char *, void *), void *arg,
 290     int flags);
 291 void dmu_objset_byteswap(void *buf, size_t size);
 292 int dsl_dataset_rename_snapshot(const char *fsname,
 293     const char *oldsnapname, const char *newsnapname, boolean_t recursive);
 294 int dmu_objset_remap_indirects(const char *fsname);
 295 
 296 typedef struct dmu_buf {
 297         uint64_t db_object;             /* object that this buffer is part of */
 298         uint64_t db_offset;             /* byte offset in this object */
 299         uint64_t db_size;               /* size of buffer in bytes */
 300         void *db_data;                  /* data in buffer */
 301 } dmu_buf_t;
 302 
 303 /*
 304  * The names of zap entries in the DIRECTORY_OBJECT of the MOS.
 305  */
 306 #define DMU_POOL_DIRECTORY_OBJECT       1
 307 #define DMU_POOL_CONFIG                 "config"
 308 #define DMU_POOL_FEATURES_FOR_WRITE     "features_for_write"
 309 #define DMU_POOL_FEATURES_FOR_READ      "features_for_read"
 310 #define DMU_POOL_FEATURE_DESCRIPTIONS   "feature_descriptions"
 311 #define DMU_POOL_FEATURE_ENABLED_TXG    "feature_enabled_txg"
 312 #define DMU_POOL_ROOT_DATASET           "root_dataset"
 313 #define DMU_POOL_SYNC_BPOBJ             "sync_bplist"
 314 #define DMU_POOL_ERRLOG_SCRUB           "errlog_scrub"
 315 #define DMU_POOL_ERRLOG_LAST            "errlog_last"
 316 #define DMU_POOL_SPARES                 "spares"
 317 #define DMU_POOL_DEFLATE                "deflate"
 318 #define DMU_POOL_HISTORY                "history"
 319 #define DMU_POOL_PROPS                  "pool_props"
 320 #define DMU_POOL_L2CACHE                "l2cache"
 321 #define DMU_POOL_TMP_USERREFS           "tmp_userrefs"
 322 #define DMU_POOL_DDT                    "DDT-%s-%s-%s"
 323 #define DMU_POOL_DDT_STATS              "DDT-statistics"
 324 #define DMU_POOL_CREATION_VERSION       "creation_version"
 325 #define DMU_POOL_SCAN                   "scan"
 326 #define DMU_POOL_FREE_BPOBJ             "free_bpobj"
 327 #define DMU_POOL_BPTREE_OBJ             "bptree_obj"
 328 #define DMU_POOL_EMPTY_BPOBJ            "empty_bpobj"
 329 #define DMU_POOL_CHECKSUM_SALT          "org.illumos:checksum_salt"
 330 #define DMU_POOL_VDEV_ZAP_MAP           "com.delphix:vdev_zap_map"
 331 #define DMU_POOL_REMOVING               "com.delphix:removing"
 332 #define DMU_POOL_OBSOLETE_BPOBJ         "com.delphix:obsolete_bpobj"
 333 #define DMU_POOL_CONDENSING_INDIRECT    "com.delphix:condensing_indirect"
 334 





 335 /*
 336  * Allocate an object from this objset.  The range of object numbers
 337  * available is (0, DN_MAX_OBJECT).  Object 0 is the meta-dnode.
 338  *
 339  * The transaction must be assigned to a txg.  The newly allocated
 340  * object will be "held" in the transaction (ie. you can modify the
 341  * newly allocated object in this transaction).
 342  *
 343  * dmu_object_alloc() chooses an object and returns it in *objectp.
 344  *
 345  * dmu_object_claim() allocates a specific object number.  If that
 346  * number is already allocated, it fails and returns EEXIST.
 347  *
 348  * Return 0 on success, or ENOSPC or EEXIST as specified above.
 349  */
 350 uint64_t dmu_object_alloc(objset_t *os, dmu_object_type_t ot,
 351     int blocksize, dmu_object_type_t bonus_type, int bonus_len, dmu_tx_t *tx);
 352 int dmu_object_claim(objset_t *os, uint64_t object, dmu_object_type_t ot,
 353     int blocksize, dmu_object_type_t bonus_type, int bonus_len, dmu_tx_t *tx);
 354 int dmu_object_reclaim(objset_t *os, uint64_t object, dmu_object_type_t ot,


 397  * Returns 0 on success, or EBUSY if there are any holds on the object
 398  * contents, or ENOTSUP as described above.
 399  */
 400 int dmu_object_set_blocksize(objset_t *os, uint64_t object, uint64_t size,
 401     int ibs, dmu_tx_t *tx);
 402 
 403 /*
 404  * Set the checksum property on a dnode.  The new checksum algorithm will
 405  * apply to all newly written blocks; existing blocks will not be affected.
 406  */
 407 void dmu_object_set_checksum(objset_t *os, uint64_t object, uint8_t checksum,
 408     dmu_tx_t *tx);
 409 
 410 /*
 411  * Set the compress property on a dnode.  The new compression algorithm will
 412  * apply to all newly written blocks; existing blocks will not be affected.
 413  */
 414 void dmu_object_set_compress(objset_t *os, uint64_t object, uint8_t compress,
 415     dmu_tx_t *tx);
 416 
 417 int dmu_object_remap_indirects(objset_t *os, uint64_t object, uint64_t txg);
 418 
 419 void
 420 dmu_write_embedded(objset_t *os, uint64_t object, uint64_t offset,
 421     void *data, uint8_t etype, uint8_t comp, int uncompressed_size,
 422     int compressed_size, int byteorder, dmu_tx_t *tx);
 423 
 424 /*
 425  * Decide how to write a block: checksum, compression, number of copies, etc.
 426  */
 427 #define WP_NOFILL       0x1
 428 #define WP_DMU_SYNC     0x2
 429 #define WP_SPILL        0x4
 430 












 431 void dmu_write_policy(objset_t *os, dnode_t *dn, int level, int wp,
 432     struct zio_prop *zp);
 433 /*
 434  * The bonus data is accessed more or less like a regular buffer.
 435  * You must dmu_bonus_hold() to get the buffer, which will give you a
 436  * dmu_buf_t with db_offset==-1ULL, and db_size = the size of the bonus
 437  * data.  As with any normal buffer, you must call dmu_buf_will_dirty()
 438  * before modifying it, and the
 439  * object must be held in an assigned transaction before calling
 440  * dmu_buf_will_dirty.  You may use dmu_buf_set_user() on the bonus
 441  * buffer as well.  You must release your hold with dmu_buf_rele().
 442  *
 443  * Returns ENOENT, EIO, or 0.
 444  */
 445 int dmu_bonus_hold(objset_t *os, uint64_t object, void *tag, dmu_buf_t **);
 446 int dmu_bonus_max(void);
 447 int dmu_set_bonus(dmu_buf_t *, int, dmu_tx_t *);
 448 int dmu_set_bonustype(dmu_buf_t *, dmu_object_type_t, dmu_tx_t *);
 449 dmu_object_type_t dmu_get_bonustype(dmu_buf_t *);
 450 int dmu_rm_spill(objset_t *, uint64_t, dmu_tx_t *);
 451 
 452 /*
 453  * Special spill buffer support used by "SA" framework
 454  */
 455 
 456 int dmu_spill_hold_by_bonus(dmu_buf_t *bonus, void *tag, dmu_buf_t **dbp);
 457 int dmu_spill_hold_by_dnode(dnode_t *dn, uint32_t flags,
 458     void *tag, dmu_buf_t **dbp);


 635 objset_t *dmu_buf_get_objset(dmu_buf_t *db);
 636 dnode_t *dmu_buf_dnode_enter(dmu_buf_t *db);
 637 void dmu_buf_dnode_exit(dmu_buf_t *db);
 638 
 639 /* Block until any in-progress dmu buf user evictions complete. */
 640 void dmu_buf_user_evict_wait(void);
 641 
 642 /*
 643  * Returns the blkptr associated with this dbuf, or NULL if not set.
 644  */
 645 struct blkptr *dmu_buf_get_blkptr(dmu_buf_t *db);
 646 
 647 /*
 648  * Indicate that you are going to modify the buffer's data (db_data).
 649  *
 650  * The transaction (tx) must be assigned to a txg (ie. you've called
 651  * dmu_tx_assign()).  The buffer's object must be held in the tx
 652  * (ie. you've called dmu_tx_hold_object(tx, db->db_object)).
 653  */
 654 void dmu_buf_will_dirty(dmu_buf_t *db, dmu_tx_t *tx);

 655 
 656 /*
 657  * You must create a transaction, then hold the objects which you will
 658  * (or might) modify as part of this transaction.  Then you must assign
 659  * the transaction to a transaction group.  Once the transaction has
 660  * been assigned, you can modify buffers which belong to held objects as
 661  * part of this transaction.  You can't modify buffers before the
 662  * transaction has been assigned; you can't modify buffers which don't
 663  * belong to objects which this transaction holds; you can't hold
 664  * objects once the transaction has been assigned.  You may hold an
 665  * object which you are going to free (with dmu_object_free()), but you
 666  * don't have to.
 667  *
 668  * You can abort the transaction before it has been assigned.
 669  *
 670  * Note that you may hold buffers (with dmu_buf_hold) at any time,
 671  * regardless of transaction state.
 672  */
 673 
 674 #define DMU_NEW_OBJECT  (-1ULL)
 675 #define DMU_OBJECT_END  (-1ULL)
 676 
 677 dmu_tx_t *dmu_tx_create(objset_t *os);
 678 void dmu_tx_hold_write(dmu_tx_t *tx, uint64_t object, uint64_t off, int len);
 679 void dmu_tx_hold_write_by_dnode(dmu_tx_t *tx, dnode_t *dn, uint64_t off,
 680     int len);
 681 void dmu_tx_hold_free(dmu_tx_t *tx, uint64_t object, uint64_t off,
 682     uint64_t len);
 683 void dmu_tx_hold_free_by_dnode(dmu_tx_t *tx, dnode_t *dn, uint64_t off,
 684     uint64_t len);
 685 void dmu_tx_hold_remap_l1indirect(dmu_tx_t *tx, uint64_t object);
 686 void dmu_tx_hold_zap(dmu_tx_t *tx, uint64_t object, int add, const char *name);
 687 void dmu_tx_hold_zap_by_dnode(dmu_tx_t *tx, dnode_t *dn, int add,
 688     const char *name);
 689 void dmu_tx_hold_bonus(dmu_tx_t *tx, uint64_t object);
 690 void dmu_tx_hold_bonus_by_dnode(dmu_tx_t *tx, dnode_t *dn);
 691 void dmu_tx_hold_spill(dmu_tx_t *tx, uint64_t object);
 692 void dmu_tx_hold_sa(dmu_tx_t *tx, struct sa_handle *hdl, boolean_t may_grow);
 693 void dmu_tx_hold_sa_create(dmu_tx_t *tx, int total_size);
 694 void dmu_tx_abort(dmu_tx_t *tx);
 695 int dmu_tx_assign(dmu_tx_t *tx, uint64_t txg_how);
 696 void dmu_tx_wait(dmu_tx_t *tx);
 697 void dmu_tx_commit(dmu_tx_t *tx);
 698 void dmu_tx_mark_netfree(dmu_tx_t *tx);
 699 
 700 /*
 701  * To register a commit callback, dmu_tx_callback_register() must be called.
 702  *
 703  * dcb_data is a pointer to caller private data that is passed on as a
 704  * callback parameter. The caller is responsible for properly allocating and
 705  * freeing it.
 706  *
 707  * When registering a callback, the transaction must be already created, but
 708  * it cannot be committed or aborted. It can be assigned to a txg or not.
 709  *
 710  * The callback will be called after the transaction has been safely written
 711  * to stable storage and will also be called if the dmu_tx is aborted.
 712  * If there is any error which prevents the transaction from being committed to
 713  * disk, the callback will be called with a value of error != 0.
 714  */
 715 typedef void dmu_tx_callback_func_t(void *dcb_data, int error);


 781         uint32_t doi_data_block_size;
 782         uint32_t doi_metadata_block_size;
 783         dmu_object_type_t doi_type;
 784         dmu_object_type_t doi_bonus_type;
 785         uint64_t doi_bonus_size;
 786         uint8_t doi_indirection;                /* 2 = dnode->indirect->data */
 787         uint8_t doi_checksum;
 788         uint8_t doi_compress;
 789         uint8_t doi_nblkptr;
 790         uint8_t doi_pad[4];
 791         uint64_t doi_physical_blocks_512;       /* data + metadata, 512b blks */
 792         uint64_t doi_max_offset;
 793         uint64_t doi_fill_count;                /* number of non-empty blocks */
 794 } dmu_object_info_t;
 795 
 796 typedef void arc_byteswap_func_t(void *buf, size_t size);
 797 
 798 typedef struct dmu_object_type_info {
 799         dmu_object_byteswap_t   ot_byteswap;
 800         boolean_t               ot_metadata;

 801         char                    *ot_name;
 802 } dmu_object_type_info_t;
 803 
 804 typedef struct dmu_object_byteswap_info {
 805         arc_byteswap_func_t     *ob_func;
 806         char                    *ob_name;
 807 } dmu_object_byteswap_info_t;
 808 
 809 extern const dmu_object_type_info_t dmu_ot[DMU_OT_NUMTYPES];
 810 extern const dmu_object_byteswap_info_t dmu_ot_byteswap[DMU_BSWAP_NUMFUNCS];
 811 
 812 /*
 813  * Get information on a DMU object.
 814  *
 815  * Return 0 on success or ENOENT if object is not allocated.
 816  *
 817  * If doi is NULL, just indicates whether the object exists.
 818  */
 819 int dmu_object_info(objset_t *os, uint64_t object, dmu_object_info_t *doi);
 820 /* Like dmu_object_info, but faster if you have a held dnode in hand. */
 821 void dmu_object_info_from_dnode(dnode_t *dn, dmu_object_info_t *doi);
 822 /* Like dmu_object_info, but faster if you have a held dbuf in hand. */
 823 void dmu_object_info_from_db(dmu_buf_t *db, dmu_object_info_t *doi);
 824 /*
 825  * Like dmu_object_info_from_db, but faster still when you only care about
 826  * the size.  This is specifically optimized for zfs_getattr().
 827  */
 828 void dmu_object_size_from_db(dmu_buf_t *db, uint32_t *blksize,
 829     u_longlong_t *nblk512);
 830 
 831 typedef struct dmu_objset_stats {
 832         uint64_t dds_num_clones; /* number of clones of this */
 833         uint64_t dds_creation_txg;
 834         uint64_t dds_guid;
 835         dmu_objset_type_t dds_type;
 836         uint8_t dds_is_snapshot;

 837         uint8_t dds_inconsistent;
 838         char dds_origin[ZFS_MAX_DATASET_NAME_LEN];
 839 } dmu_objset_stats_t;
 840 
 841 /*
 842  * Get stats on a dataset.
 843  */
 844 void dmu_objset_fast_stat(objset_t *os, dmu_objset_stats_t *stat);
 845 
 846 /*
 847  * Add entries to the nvlist for all the objset's properties.  See
 848  * zfs_prop_table[] and zfs(1m) for details on the properties.
 849  */
 850 void dmu_objset_stats(objset_t *os, struct nvlist *nv);
 851 
 852 /*
 853  * Get the space usage statistics for statvfs().
 854  *
 855  * refdbytes is the amount of space "referenced" by this objset.
 856  * availbytes is the amount of space available to this objset, taking


 870  * change, so there is a small probability that it will collide.)
 871  */
 872 uint64_t dmu_objset_fsid_guid(objset_t *os);
 873 
 874 /*
 875  * Get the [cm]time for an objset's snapshot dir
 876  */
 877 timestruc_t dmu_objset_snap_cmtime(objset_t *os);
 878 
 879 int dmu_objset_is_snapshot(objset_t *os);
 880 
 881 extern struct spa *dmu_objset_spa(objset_t *os);
 882 extern struct zilog *dmu_objset_zil(objset_t *os);
 883 extern struct dsl_pool *dmu_objset_pool(objset_t *os);
 884 extern struct dsl_dataset *dmu_objset_ds(objset_t *os);
 885 extern void dmu_objset_name(objset_t *os, char *buf);
 886 extern dmu_objset_type_t dmu_objset_type(objset_t *os);
 887 extern uint64_t dmu_objset_id(objset_t *os);
 888 extern zfs_sync_type_t dmu_objset_syncprop(objset_t *os);
 889 extern zfs_logbias_op_t dmu_objset_logbias(objset_t *os);


 890 extern int dmu_snapshot_list_next(objset_t *os, int namelen, char *name,
 891     uint64_t *id, uint64_t *offp, boolean_t *case_conflict);
 892 extern int dmu_snapshot_realname(objset_t *os, char *name, char *real,
 893     int maxlen, boolean_t *conflict);
 894 extern int dmu_dir_list_next(objset_t *os, int namelen, char *name,
 895     uint64_t *idp, uint64_t *offp);
 896 
 897 typedef int objset_used_cb_t(dmu_object_type_t bonustype,
 898     void *bonus, uint64_t *userp, uint64_t *groupp);
 899 extern void dmu_objset_register_type(dmu_objset_type_t ost,
 900     objset_used_cb_t *cb);
 901 extern void dmu_objset_set_user(objset_t *os, void *user_ptr);
 902 extern void *dmu_objset_get_user(objset_t *os);
 903 
 904 /*
 905  * Return the txg number for the given assigned transaction.
 906  */
 907 uint64_t dmu_tx_get_txg(dmu_tx_t *tx);
 908 
 909 /*




   5  * Common Development and Distribution License (the "License").
   6  * You may not use this file except in compliance with the License.
   7  *
   8  * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
   9  * or http://www.opensolaris.org/os/licensing.
  10  * See the License for the specific language governing permissions
  11  * and limitations under the License.
  12  *
  13  * When distributing Covered Code, include this CDDL HEADER in each
  14  * file and include the License file at usr/src/OPENSOLARIS.LICENSE.
  15  * If applicable, add the following below this CDDL HEADER, with the
  16  * fields enclosed by brackets "[]" replaced with your own identifying
  17  * information: Portions Copyright [yyyy] [name of copyright owner]
  18  *
  19  * CDDL HEADER END
  20  */
  21 
  22 /*
  23  * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
  24  * Copyright (c) 2011, 2017 by Delphix. All rights reserved.
  25  * Copyright 2016 Nexenta Systems, Inc. All rights reserved.
  26  * Copyright (c) 2012, Joyent, Inc. All rights reserved.
  27  * Copyright 2013 DEY Storage Systems, Inc.
  28  * Copyright 2014 HybridCluster. All rights reserved.
  29  * Copyright (c) 2014 Spectra Logic Corporation, All rights reserved.
  30  * Copyright 2013 Saso Kiselkov. All rights reserved.
  31  * Copyright (c) 2014 Integros [integros.com]
  32  */
  33 
  34 /* Portions Copyright 2010 Robert Milkowski */
  35 
  36 #ifndef _SYS_DMU_H
  37 #define _SYS_DMU_H
  38 
  39 /*
  40  * This file describes the interface that the DMU provides for its
  41  * consumers.
  42  *
  43  * The DMU also interacts with the SPA.  That interface is described in
  44  * dmu_spa.h.
  45  */


  92         DMU_BSWAP_ZNODE,
  93         DMU_BSWAP_OLDACL,
  94         DMU_BSWAP_ACL,
  95         /*
  96          * Allocating a new byteswap type number makes the on-disk format
  97          * incompatible with any other format that uses the same number.
  98          *
  99          * Data can usually be structured to work with one of the
 100          * DMU_BSWAP_UINT* or DMU_BSWAP_ZAP types.
 101          */
 102         DMU_BSWAP_NUMFUNCS
 103 } dmu_object_byteswap_t;
 104 
 105 #define DMU_OT_NEWTYPE 0x80
 106 #define DMU_OT_METADATA 0x40
 107 #define DMU_OT_BYTESWAP_MASK 0x3f
 108 
 109 /*
 110  * Defines a uint8_t object type. Object types specify if the data
 111  * in the object is metadata (boolean) and how to byteswap the data
 112  * (dmu_object_byteswap_t). All of the types created by this method
 113  * are cached in the dbuf metadata cache.
 114  */
 115 #define DMU_OT(byteswap, metadata) \
 116         (DMU_OT_NEWTYPE | \
 117         ((metadata) ? DMU_OT_METADATA : 0) | \
 118         ((byteswap) & DMU_OT_BYTESWAP_MASK))
 119 
 120 #define DMU_OT_IS_VALID(ot) (((ot) & DMU_OT_NEWTYPE) ? \
 121         ((ot) & DMU_OT_BYTESWAP_MASK) < DMU_BSWAP_NUMFUNCS : \
 122         (ot) < DMU_OT_NUMTYPES)
 123 
 124 #define DMU_OT_IS_METADATA(ot) (((ot) & DMU_OT_NEWTYPE) ? \
 125         ((ot) & DMU_OT_METADATA) : \
 126         dmu_ot[(ot)].ot_metadata)
 127 
 128 #define DMU_OT_IS_METADATA_CACHED(ot) (((ot) & DMU_OT_NEWTYPE) ? \
 129         B_TRUE : dmu_ot[(ot)].ot_dbuf_metadata_cache)
 130 
 131 /*
 132  * These object types use bp_fill != 1 for their L0 bp's. Therefore they can't
 133  * have their data embedded (i.e. use a BP_IS_EMBEDDED() bp), because bp_fill
 134  * is repurposed for embedded BPs.
 135  */
 136 #define DMU_OT_HAS_FILL(ot) \
 137         ((ot) == DMU_OT_DNODE || (ot) == DMU_OT_OBJSET)
 138 
 139 #define DMU_OT_BYTESWAP(ot) (((ot) & DMU_OT_NEWTYPE) ? \
 140         ((ot) & DMU_OT_BYTESWAP_MASK) : \
 141         dmu_ot[(ot)].ot_byteswap)
 142 
 143 typedef enum dmu_object_type {
 144         DMU_OT_NONE,
 145         /* general: */
 146         DMU_OT_OBJECT_DIRECTORY,        /* ZAP */
 147         DMU_OT_OBJECT_ARRAY,            /* UINT64 */
 148         DMU_OT_PACKED_NVLIST,           /* UINT8 (XDR by nvlist_pack/unpack) */
 149         DMU_OT_PACKED_NVLIST_SIZE,      /* UINT64 */
 150         DMU_OT_BPOBJ,                   /* UINT64 */


 218          * of indexing into dmu_ot directly (this works for both DMU_OT_* types
 219          * and DMU_OTN_* types).
 220          */
 221         DMU_OT_NUMTYPES,
 222 
 223         /*
 224          * Names for valid types declared with DMU_OT().
 225          */
 226         DMU_OTN_UINT8_DATA = DMU_OT(DMU_BSWAP_UINT8, B_FALSE),
 227         DMU_OTN_UINT8_METADATA = DMU_OT(DMU_BSWAP_UINT8, B_TRUE),
 228         DMU_OTN_UINT16_DATA = DMU_OT(DMU_BSWAP_UINT16, B_FALSE),
 229         DMU_OTN_UINT16_METADATA = DMU_OT(DMU_BSWAP_UINT16, B_TRUE),
 230         DMU_OTN_UINT32_DATA = DMU_OT(DMU_BSWAP_UINT32, B_FALSE),
 231         DMU_OTN_UINT32_METADATA = DMU_OT(DMU_BSWAP_UINT32, B_TRUE),
 232         DMU_OTN_UINT64_DATA = DMU_OT(DMU_BSWAP_UINT64, B_FALSE),
 233         DMU_OTN_UINT64_METADATA = DMU_OT(DMU_BSWAP_UINT64, B_TRUE),
 234         DMU_OTN_ZAP_DATA = DMU_OT(DMU_BSWAP_ZAP, B_FALSE),
 235         DMU_OTN_ZAP_METADATA = DMU_OT(DMU_BSWAP_ZAP, B_TRUE),
 236 } dmu_object_type_t;
 237 
 238 typedef enum txg_how {
 239         TXG_WAIT = 1,
 240         TXG_NOWAIT,
 241         TXG_WAITED,
 242 } txg_how_t;
 243 
 244 /*
 245  * Selected classes of metadata


 246  */
 247 #define DMU_OT_IS_DDT_META(type)        \
 248         ((type == DMU_OT_DDT_ZAP) ||    \
 249         (type == DMU_OT_DDT_STATS))
 250 
 251 #define DMU_OT_IS_ZPL_META(type)                \
 252         ((type == DMU_OT_ZNODE) ||              \
 253         (type == DMU_OT_OLDACL) ||              \
 254         (type == DMU_OT_DIRECTORY_CONTENTS) ||  \
 255         (type == DMU_OT_MASTER_NODE) ||         \
 256         (type == DMU_OT_UNLINKED_SET))
 257 
 258 void byteswap_uint64_array(void *buf, size_t size);
 259 void byteswap_uint32_array(void *buf, size_t size);
 260 void byteswap_uint16_array(void *buf, size_t size);
 261 void byteswap_uint8_array(void *buf, size_t size);
 262 void zap_byteswap(void *buf, size_t size);
 263 void zfs_oldacl_byteswap(void *buf, size_t size);
 264 void zfs_acl_byteswap(void *buf, size_t size);
 265 void zfs_znode_byteswap(void *buf, size_t size);
 266 
 267 #define DS_FIND_SNAPSHOTS       (1<<0)
 268 #define DS_FIND_CHILDREN        (1<<1)
 269 #define DS_FIND_SERIALIZE       (1<<2)
 270 
 271 /*
 272  * The maximum number of bytes that can be accessed as part of one
 273  * operation, including metadata.
 274  */
 275 #define DMU_MAX_ACCESS (32 * 1024 * 1024) /* 32MB */
 276 #define DMU_MAX_DELETEBLKCNT (20480) /* ~5MB of indirect blocks */
 277 


 289 int dmu_objset_hold(const char *name, void *tag, objset_t **osp);
 290 int dmu_objset_own(const char *name, dmu_objset_type_t type,
 291     boolean_t readonly, void *tag, objset_t **osp);
 292 void dmu_objset_rele(objset_t *os, void *tag);
 293 void dmu_objset_disown(objset_t *os, void *tag);
 294 int dmu_objset_open_ds(struct dsl_dataset *ds, objset_t **osp);
 295 
 296 void dmu_objset_evict_dbufs(objset_t *os);
 297 int dmu_objset_create(const char *name, dmu_objset_type_t type, uint64_t flags,
 298     void (*func)(objset_t *os, void *arg, cred_t *cr, dmu_tx_t *tx), void *arg);
 299 int dmu_objset_clone(const char *name, const char *origin);
 300 int dsl_destroy_snapshots_nvl(struct nvlist *snaps, boolean_t defer,
 301     struct nvlist *errlist);
 302 int dmu_objset_snapshot_one(const char *fsname, const char *snapname);
 303 int dmu_objset_snapshot_tmp(const char *, const char *, int);
 304 int dmu_objset_find(char *name, int func(const char *, void *), void *arg,
 305     int flags);
 306 void dmu_objset_byteswap(void *buf, size_t size);
 307 int dsl_dataset_rename_snapshot(const char *fsname,
 308     const char *oldsnapname, const char *newsnapname, boolean_t recursive);

 309 
 310 typedef struct dmu_buf {
 311         uint64_t db_object;             /* object that this buffer is part of */
 312         uint64_t db_offset;             /* byte offset in this object */
 313         uint64_t db_size;               /* size of buffer in bytes */
 314         void *db_data;                  /* data in buffer */
 315 } dmu_buf_t;
 316 
 317 /*
 318  * The names of zap entries in the DIRECTORY_OBJECT of the MOS.
 319  */
 320 #define DMU_POOL_DIRECTORY_OBJECT       1
 321 #define DMU_POOL_CONFIG                 "config"
 322 #define DMU_POOL_FEATURES_FOR_WRITE     "features_for_write"
 323 #define DMU_POOL_FEATURES_FOR_READ      "features_for_read"
 324 #define DMU_POOL_FEATURE_DESCRIPTIONS   "feature_descriptions"
 325 #define DMU_POOL_FEATURE_ENABLED_TXG    "feature_enabled_txg"
 326 #define DMU_POOL_ROOT_DATASET           "root_dataset"
 327 #define DMU_POOL_SYNC_BPOBJ             "sync_bplist"
 328 #define DMU_POOL_ERRLOG_SCRUB           "errlog_scrub"
 329 #define DMU_POOL_ERRLOG_LAST            "errlog_last"
 330 #define DMU_POOL_SPARES                 "spares"
 331 #define DMU_POOL_DEFLATE                "deflate"
 332 #define DMU_POOL_HISTORY                "history"
 333 #define DMU_POOL_PROPS                  "pool_props"
 334 #define DMU_POOL_L2CACHE                "l2cache"
 335 #define DMU_POOL_TMP_USERREFS           "tmp_userrefs"
 336 #define DMU_POOL_DDT                    "DDT-%s-%s-%s"
 337 #define DMU_POOL_DDT_STATS              "DDT-statistics"
 338 #define DMU_POOL_CREATION_VERSION       "creation_version"
 339 #define DMU_POOL_SCAN                   "scan"
 340 #define DMU_POOL_FREE_BPOBJ             "free_bpobj"
 341 #define DMU_POOL_BPTREE_OBJ             "bptree_obj"
 342 #define DMU_POOL_EMPTY_BPOBJ            "empty_bpobj"
 343 #define DMU_POOL_CHECKSUM_SALT          "org.illumos:checksum_salt"
 344 #define DMU_POOL_VDEV_ZAP_MAP           "com.delphix:vdev_zap_map"



 345 
 346 #define DMU_POOL_COS_PROPS              "cos_props"
 347 #define DMU_POOL_VDEV_PROPS             "vdev_props"
 348 #define DMU_POOL_TRIM_START_TIME        "trim_start_time"
 349 #define DMU_POOL_TRIM_STOP_TIME         "trim_stop_time"
 350 
 351 /*
 352  * Allocate an object from this objset.  The range of object numbers
 353  * available is (0, DN_MAX_OBJECT).  Object 0 is the meta-dnode.
 354  *
 355  * The transaction must be assigned to a txg.  The newly allocated
 356  * object will be "held" in the transaction (ie. you can modify the
 357  * newly allocated object in this transaction).
 358  *
 359  * dmu_object_alloc() chooses an object and returns it in *objectp.
 360  *
 361  * dmu_object_claim() allocates a specific object number.  If that
 362  * number is already allocated, it fails and returns EEXIST.
 363  *
 364  * Return 0 on success, or ENOSPC or EEXIST as specified above.
 365  */
 366 uint64_t dmu_object_alloc(objset_t *os, dmu_object_type_t ot,
 367     int blocksize, dmu_object_type_t bonus_type, int bonus_len, dmu_tx_t *tx);
 368 int dmu_object_claim(objset_t *os, uint64_t object, dmu_object_type_t ot,
 369     int blocksize, dmu_object_type_t bonus_type, int bonus_len, dmu_tx_t *tx);
 370 int dmu_object_reclaim(objset_t *os, uint64_t object, dmu_object_type_t ot,


 413  * Returns 0 on success, or EBUSY if there are any holds on the object
 414  * contents, or ENOTSUP as described above.
 415  */
 416 int dmu_object_set_blocksize(objset_t *os, uint64_t object, uint64_t size,
 417     int ibs, dmu_tx_t *tx);
 418 
 419 /*
 420  * Set the checksum property on a dnode.  The new checksum algorithm will
 421  * apply to all newly written blocks; existing blocks will not be affected.
 422  */
 423 void dmu_object_set_checksum(objset_t *os, uint64_t object, uint8_t checksum,
 424     dmu_tx_t *tx);
 425 
 426 /*
 427  * Set the compress property on a dnode.  The new compression algorithm will
 428  * apply to all newly written blocks; existing blocks will not be affected.
 429  */
 430 void dmu_object_set_compress(objset_t *os, uint64_t object, uint8_t compress,
 431     dmu_tx_t *tx);
 432 


 433 void
 434 dmu_write_embedded(objset_t *os, uint64_t object, uint64_t offset,
 435     void *data, uint8_t etype, uint8_t comp, int uncompressed_size,
 436     int compressed_size, int byteorder, dmu_tx_t *tx);
 437 
 438 /*
 439  * Decide how to write a block: checksum, compression, number of copies, etc.
 440  */
 441 #define WP_NOFILL       0x1
 442 #define WP_DMU_SYNC     0x2
 443 #define WP_SPILL        0x4
 444 
 445 #define WP_SPECIALCLASS_SHIFT   (16)
 446 #define WP_SPECIALCLASS_BITS    (1) /* 1 bits per storage class */
 447 #define WP_SPECIALCLASS_MASK    (((1 << WP_SPECIALCLASS_BITS) - 1) \
 448         << WP_SPECIALCLASS_SHIFT)
 449 
 450 #define WP_SET_SPECIALCLASS(flags, sclass)      { \
 451         flags |= ((sclass << WP_SPECIALCLASS_SHIFT) & WP_SPECIALCLASS_MASK); \
 452 }
 453 
 454 #define WP_GET_SPECIALCLASS(flags) \
 455         ((flags & WP_SPECIALCLASS_MASK)     >> WP_SPECIALCLASS_SHIFT)
 456 
 457 void dmu_write_policy(objset_t *os, dnode_t *dn, int level, int wp,
 458     struct zio_prop *zp);
 459 /*
 460  * The bonus data is accessed more or less like a regular buffer.
 461  * You must dmu_bonus_hold() to get the buffer, which will give you a
 462  * dmu_buf_t with db_offset==-1ULL, and db_size = the size of the bonus
 463  * data.  As with any normal buffer, you must call dmu_buf_read() to
 464  * read db_data, dmu_buf_will_dirty() before modifying it, and the
 465  * object must be held in an assigned transaction before calling
 466  * dmu_buf_will_dirty.  You may use dmu_buf_set_user() on the bonus
 467  * buffer as well.  You must release your hold with dmu_buf_rele().
 468  *
 469  * Returns ENOENT, EIO, or 0.
 470  */
 471 int dmu_bonus_hold(objset_t *os, uint64_t object, void *tag, dmu_buf_t **);
 472 int dmu_bonus_max(void);
 473 int dmu_set_bonus(dmu_buf_t *, int, dmu_tx_t *);
 474 int dmu_set_bonustype(dmu_buf_t *, dmu_object_type_t, dmu_tx_t *);
 475 dmu_object_type_t dmu_get_bonustype(dmu_buf_t *);
 476 int dmu_rm_spill(objset_t *, uint64_t, dmu_tx_t *);
 477 
 478 /*
 479  * Special spill buffer support used by "SA" framework
 480  */
 481 
 482 int dmu_spill_hold_by_bonus(dmu_buf_t *bonus, void *tag, dmu_buf_t **dbp);
 483 int dmu_spill_hold_by_dnode(dnode_t *dn, uint32_t flags,
 484     void *tag, dmu_buf_t **dbp);


 661 objset_t *dmu_buf_get_objset(dmu_buf_t *db);
 662 dnode_t *dmu_buf_dnode_enter(dmu_buf_t *db);
 663 void dmu_buf_dnode_exit(dmu_buf_t *db);
 664 
 665 /* Block until any in-progress dmu buf user evictions complete. */
 666 void dmu_buf_user_evict_wait(void);
 667 
 668 /*
 669  * Returns the blkptr associated with this dbuf, or NULL if not set.
 670  */
 671 struct blkptr *dmu_buf_get_blkptr(dmu_buf_t *db);
 672 
 673 /*
 674  * Indicate that you are going to modify the buffer's data (db_data).
 675  *
 676  * The transaction (tx) must be assigned to a txg (ie. you've called
 677  * dmu_tx_assign()).  The buffer's object must be held in the tx
 678  * (ie. you've called dmu_tx_hold_object(tx, db->db_object)).
 679  */
 680 void dmu_buf_will_dirty(dmu_buf_t *db, dmu_tx_t *tx);
 681 void dmu_buf_will_dirty_sc(dmu_buf_t *db, dmu_tx_t *tx, boolean_t sc);
 682 
 683 /*
 684  * You must create a transaction, then hold the objects which you will
 685  * (or might) modify as part of this transaction.  Then you must assign
 686  * the transaction to a transaction group.  Once the transaction has
 687  * been assigned, you can modify buffers which belong to held objects as
 688  * part of this transaction.  You can't modify buffers before the
 689  * transaction has been assigned; you can't modify buffers which don't
 690  * belong to objects which this transaction holds; you can't hold
 691  * objects once the transaction has been assigned.  You may hold an
 692  * object which you are going to free (with dmu_object_free()), but you
 693  * don't have to.
 694  *
 695  * You can abort the transaction before it has been assigned.
 696  *
 697  * Note that you may hold buffers (with dmu_buf_hold) at any time,
 698  * regardless of transaction state.
 699  */
 700 
 701 #define DMU_NEW_OBJECT  (-1ULL)
 702 #define DMU_OBJECT_END  (-1ULL)
 703 
 704 dmu_tx_t *dmu_tx_create(objset_t *os);
 705 void dmu_tx_hold_write(dmu_tx_t *tx, uint64_t object, uint64_t off, int len);
 706 void dmu_tx_hold_write_by_dnode(dmu_tx_t *tx, dnode_t *dn, uint64_t off,
 707     int len);
 708 void dmu_tx_hold_free(dmu_tx_t *tx, uint64_t object, uint64_t off,
 709     uint64_t len);
 710 void dmu_tx_hold_free_by_dnode(dmu_tx_t *tx, dnode_t *dn, uint64_t off,
 711     uint64_t len);

 712 void dmu_tx_hold_zap(dmu_tx_t *tx, uint64_t object, int add, const char *name);
 713 void dmu_tx_hold_zap_by_dnode(dmu_tx_t *tx, dnode_t *dn, int add,
 714     const char *name);
 715 void dmu_tx_hold_bonus(dmu_tx_t *tx, uint64_t object);
 716 void dmu_tx_hold_bonus_by_dnode(dmu_tx_t *tx, dnode_t *dn);
 717 void dmu_tx_hold_spill(dmu_tx_t *tx, uint64_t object);
 718 void dmu_tx_hold_sa(dmu_tx_t *tx, struct sa_handle *hdl, boolean_t may_grow);
 719 void dmu_tx_hold_sa_create(dmu_tx_t *tx, int total_size);
 720 void dmu_tx_abort(dmu_tx_t *tx);
 721 int dmu_tx_assign(dmu_tx_t *tx, enum txg_how txg_how);
 722 void dmu_tx_wait(dmu_tx_t *tx);
 723 void dmu_tx_commit(dmu_tx_t *tx);
 724 void dmu_tx_mark_netfree(dmu_tx_t *tx);
 725 
 726 /*
 727  * To register a commit callback, dmu_tx_callback_register() must be called.
 728  *
 729  * dcb_data is a pointer to caller private data that is passed on as a
 730  * callback parameter. The caller is responsible for properly allocating and
 731  * freeing it.
 732  *
 733  * When registering a callback, the transaction must be already created, but
 734  * it cannot be committed or aborted. It can be assigned to a txg or not.
 735  *
 736  * The callback will be called after the transaction has been safely written
 737  * to stable storage and will also be called if the dmu_tx is aborted.
 738  * If there is any error which prevents the transaction from being committed to
 739  * disk, the callback will be called with a value of error != 0.
 740  */
 741 typedef void dmu_tx_callback_func_t(void *dcb_data, int error);


 807         uint32_t doi_data_block_size;
 808         uint32_t doi_metadata_block_size;
 809         dmu_object_type_t doi_type;
 810         dmu_object_type_t doi_bonus_type;
 811         uint64_t doi_bonus_size;
 812         uint8_t doi_indirection;                /* 2 = dnode->indirect->data */
 813         uint8_t doi_checksum;
 814         uint8_t doi_compress;
 815         uint8_t doi_nblkptr;
 816         uint8_t doi_pad[4];
 817         uint64_t doi_physical_blocks_512;       /* data + metadata, 512b blks */
 818         uint64_t doi_max_offset;
 819         uint64_t doi_fill_count;                /* number of non-empty blocks */
 820 } dmu_object_info_t;
 821 
 822 typedef void arc_byteswap_func_t(void *buf, size_t size);
 823 
 824 typedef struct dmu_object_type_info {
 825         dmu_object_byteswap_t   ot_byteswap;
 826         boolean_t               ot_metadata;
 827         boolean_t               ot_dbuf_metadata_cache;
 828         char                    *ot_name;
 829 } dmu_object_type_info_t;
 830 
 831 typedef struct dmu_object_byteswap_info {
 832         arc_byteswap_func_t     *ob_func;
 833         char                    *ob_name;
 834 } dmu_object_byteswap_info_t;
 835 
 836 extern const dmu_object_type_info_t dmu_ot[DMU_OT_NUMTYPES];
 837 extern const dmu_object_byteswap_info_t dmu_ot_byteswap[DMU_BSWAP_NUMFUNCS];
 838 
 839 /*
 840  * Get information on a DMU object.
 841  *
 842  * Return 0 on success or ENOENT if object is not allocated.
 843  *
 844  * If doi is NULL, just indicates whether the object exists.
 845  */
 846 int dmu_object_info(objset_t *os, uint64_t object, dmu_object_info_t *doi);
 847 /* Like dmu_object_info, but faster if you have a held dnode in hand. */
 848 void dmu_object_info_from_dnode(dnode_t *dn, dmu_object_info_t *doi);
 849 /* Like dmu_object_info, but faster if you have a held dbuf in hand. */
 850 void dmu_object_info_from_db(dmu_buf_t *db, dmu_object_info_t *doi);
 851 /*
 852  * Like dmu_object_info_from_db, but faster still when you only care about
 853  * the size.  This is specifically optimized for zfs_getattr().
 854  */
 855 void dmu_object_size_from_db(dmu_buf_t *db, uint32_t *blksize,
 856     u_longlong_t *nblk512);
 857 
 858 typedef struct dmu_objset_stats {
 859         uint64_t dds_num_clones; /* number of clones of this */
 860         uint64_t dds_creation_txg;
 861         uint64_t dds_guid;
 862         dmu_objset_type_t dds_type;
 863         uint8_t dds_is_snapshot;
 864         uint8_t dds_is_autosnapshot;
 865         uint8_t dds_inconsistent;
 866         char dds_origin[ZFS_MAX_DATASET_NAME_LEN];
 867 } dmu_objset_stats_t;
 868 
 869 /*
 870  * Get stats on a dataset.
 871  */
 872 void dmu_objset_fast_stat(objset_t *os, dmu_objset_stats_t *stat);
 873 
 874 /*
 875  * Add entries to the nvlist for all the objset's properties.  See
 876  * zfs_prop_table[] and zfs(1m) for details on the properties.
 877  */
 878 void dmu_objset_stats(objset_t *os, struct nvlist *nv);
 879 
 880 /*
 881  * Get the space usage statistics for statvfs().
 882  *
 883  * refdbytes is the amount of space "referenced" by this objset.
 884  * availbytes is the amount of space available to this objset, taking


 898  * change, so there is a small probability that it will collide.)
 899  */
 900 uint64_t dmu_objset_fsid_guid(objset_t *os);
 901 
 902 /*
 903  * Get the [cm]time for an objset's snapshot dir
 904  */
 905 timestruc_t dmu_objset_snap_cmtime(objset_t *os);
 906 
 907 int dmu_objset_is_snapshot(objset_t *os);
 908 
 909 extern struct spa *dmu_objset_spa(objset_t *os);
 910 extern struct zilog *dmu_objset_zil(objset_t *os);
 911 extern struct dsl_pool *dmu_objset_pool(objset_t *os);
 912 extern struct dsl_dataset *dmu_objset_ds(objset_t *os);
 913 extern void dmu_objset_name(objset_t *os, char *buf);
 914 extern dmu_objset_type_t dmu_objset_type(objset_t *os);
 915 extern uint64_t dmu_objset_id(objset_t *os);
 916 extern zfs_sync_type_t dmu_objset_syncprop(objset_t *os);
 917 extern zfs_logbias_op_t dmu_objset_logbias(objset_t *os);
 918 int dmu_clone_list_next(objset_t *os, int len, char *name,
 919     uint64_t *idp, uint64_t *offp);
 920 extern int dmu_snapshot_list_next(objset_t *os, int namelen, char *name,
 921     uint64_t *id, uint64_t *offp, boolean_t *case_conflict);
 922 extern int dmu_snapshot_realname(objset_t *os, char *name, char *real,
 923     int maxlen, boolean_t *conflict);
 924 extern int dmu_dir_list_next(objset_t *os, int namelen, char *name,
 925     uint64_t *idp, uint64_t *offp);
 926 
 927 typedef int objset_used_cb_t(dmu_object_type_t bonustype,
 928     void *bonus, uint64_t *userp, uint64_t *groupp);
 929 extern void dmu_objset_register_type(dmu_objset_type_t ost,
 930     objset_used_cb_t *cb);
 931 extern void dmu_objset_set_user(objset_t *os, void *user_ptr);
 932 extern void *dmu_objset_get_user(objset_t *os);
 933 
 934 /*
 935  * Return the txg number for the given assigned transaction.
 936  */
 937 uint64_t dmu_tx_get_txg(dmu_tx_t *tx);
 938 
 939 /*