Print this page
NEX-19394 backport 9337 zfs get all is slow due to uncached metadata
Reviewed by: Joyce McIntosh <joyce.mcintosh@nexenta.com>
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Thomas Caputi <tcaputi@datto.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
 Conflicts:
  usr/src/uts/common/fs/zfs/dbuf.c
  usr/src/uts/common/fs/zfs/dmu.c
  usr/src/uts/common/fs/zfs/sys/dmu_objset.h
NEX-9752 backport illumos 6950 ARC should cache compressed data
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
6950 ARC should cache compressed data
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Don Brady <don.brady@intel.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
6267 dn_bonus evicted too early
Reviewed by: Richard Yao <ryao@gentoo.org>
Reviewed by: Xin LI <delphij@freebsd.org>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
NEX-4582 update wrc test cases for allow to use write back cache per tree of datasets
Reviewed by: Steve Peng <steve.peng@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
5960 zfs recv should prefetch indirect blocks
5925 zfs receive -o origin=
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
5911 ZFS "hangs" while deleting file
Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com>
Reviewed by: Alek Pinchuk <alek@nexenta.com>
Reviewed by: Simon Klinkert <simon.klinkert@gmail.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
NEX-1823 Slow performance doing of a large dataset
5911 ZFS "hangs" while deleting file
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Bayard Bell <bayard.bell@nexenta.com>
NEX-3165 segregate ddt in arc
SUP-507 Delete or truncate of large files delayed on datasets with small recordsize
Reviewed by: Albert Lee <trisk@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Ilya Usvyatsky <ilya.usvyatsky@nexenta.com>
Reviewed by: Tony Nguyen <tony.nguyen@nexenta.com>
4370 avoid transmitting holes during zfs send
4371 DMU code clean up
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net>
Approved by: Garrett D'Amore <garrett@damore.org>
OS-80 support for vdev and CoS properties for the new I/O scheduler
OS-95 lint warning introduced by OS-61
Make special vdev subtree topology the same as regular vdev subtree to simplify testcase setup
Fixup merge issues
Issue #7: add cacheability to the properties
          Contributors: Boris Protopopov
DDT is placed either into special or to L2ARC but not in both
Support for secondarycache=data option
Align mutex tables in arc.c and dbuf.c to 64 bytes (cache line), place each kmutex_t on cache line by itself to avoid false sharing
re #12585 rb4049 ZFS++ work port - refactoring to improve separation of open/closed code, bug fixes, performance improvements - open code
Bug 11205: add missing libzfs_closed_stubs.c to fix opensource-only build.
ZFS plus work: special vdevs, cos, cos/vdev properties

Split Close
Expand all
Collapse all
          --- old/usr/src/uts/common/fs/zfs/sys/dbuf.h
          +++ new/usr/src/uts/common/fs/zfs/sys/dbuf.h
↓ open down ↓ 15 lines elided ↑ open up ↑
  16   16   * fields enclosed by brackets "[]" replaced with your own identifying
  17   17   * information: Portions Copyright [yyyy] [name of copyright owner]
  18   18   *
  19   19   * CDDL HEADER END
  20   20   */
  21   21  /*
  22   22   * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
  23   23   * Copyright (c) 2012, 2015 by Delphix. All rights reserved.
  24   24   * Copyright (c) 2013 by Saso Kiselkov. All rights reserved.
  25   25   * Copyright (c) 2014 Spectra Logic Corporation, All rights reserved.
       26 + * Copyright 2015 Nexenta Systems, Inc. All rights reserved.
  26   27   */
  27   28  
  28   29  #ifndef _SYS_DBUF_H
  29   30  #define _SYS_DBUF_H
  30   31  
  31   32  #include <sys/dmu.h>
  32   33  #include <sys/spa.h>
  33   34  #include <sys/txg.h>
  34   35  #include <sys/zio.h>
  35   36  #include <sys/arc.h>
↓ open down ↓ 12 lines elided ↑ open up ↑
  48   49   * define flags for dbuf_read
  49   50   */
  50   51  
  51   52  #define DB_RF_MUST_SUCCEED      (1 << 0)
  52   53  #define DB_RF_CANFAIL           (1 << 1)
  53   54  #define DB_RF_HAVESTRUCT        (1 << 2)
  54   55  #define DB_RF_NOPREFETCH        (1 << 3)
  55   56  #define DB_RF_NEVERWAIT         (1 << 4)
  56   57  #define DB_RF_CACHED            (1 << 5)
  57   58  
       59 +#define DBUF_EVICT_ALL          -1
       60 +
  58   61  /*
  59   62   * The simplified state transition diagram for dbufs looks like:
  60   63   *
  61   64   *              +----> READ ----+
  62   65   *              |               |
  63   66   *              |               V
  64   67   *  (alloc)-->UNCACHED       CACHED-->EVICTING-->(free)
  65   68   *              |               ^        ^
  66   69   *              |               |        |
  67   70   *              +----> FILL ----+        |
↓ open down ↓ 8 lines elided ↑ open up ↑
  76   79  typedef enum dbuf_states {
  77   80          DB_SEARCH = -1,
  78   81          DB_UNCACHED,
  79   82          DB_FILL,
  80   83          DB_NOFILL,
  81   84          DB_READ,
  82   85          DB_CACHED,
  83   86          DB_EVICTING
  84   87  } dbuf_states_t;
  85   88  
       89 +typedef enum dbuf_cached_state {
       90 +        DB_NO_CACHE = -1,
       91 +        DB_DBUF_CACHE,
       92 +        DB_DBUF_METADATA_CACHE,
       93 +        DB_CACHE_MAX
       94 +} dbuf_cached_state_t;
       95 +
  86   96  struct dnode;
  87   97  struct dmu_tx;
  88   98  
  89   99  /*
  90  100   * level = 0 means the user data
  91  101   * level = 1 means the single indirect block
  92  102   * etc.
  93  103   */
  94  104  
  95  105  struct dmu_buf_impl;
↓ open down ↓ 22 lines elided ↑ open up ↑
 118  128  
 119  129          /* pointer to parent dirty record */
 120  130          struct dbuf_dirty_record *dr_parent;
 121  131  
 122  132          /* How much space was changed to dsl_pool_dirty_space() for this? */
 123  133          unsigned int dr_accounted;
 124  134  
 125  135          /* A copy of the bp that points to us */
 126  136          blkptr_t dr_bp_copy;
 127  137  
      138 +        /* use special class of dirty entry */
      139 +        boolean_t dr_usesc;
      140 +
 128  141          union dirty_types {
 129  142                  struct dirty_indirect {
 130  143  
 131  144                          /* protect access to list */
 132  145                          kmutex_t dr_mtx;
 133  146  
 134  147                          /* Our list of dirty children */
 135  148                          list_t dr_children;
 136  149                  } di;
 137  150                  struct dirty_leaf {
↓ open down ↓ 84 lines elided ↑ open up ↑
 222  235  
 223  236          /* pointer to most recent dirty record for this buffer */
 224  237          dbuf_dirty_record_t *db_last_dirty;
 225  238  
 226  239          /*
 227  240           * Our link on the owner dnodes's dn_dbufs list.
 228  241           * Protected by its dn_dbufs_mtx.
 229  242           */
 230  243          avl_node_t db_link;
 231  244  
 232      -        /*
 233      -         * Link in dbuf_cache.
 234      -         */
      245 +        /* Link in dbuf_cache or dbuf_metadata_cache */
 235  246          multilist_node_t db_cache_link;
 236  247  
      248 +        /* Tells us which dbuf cache this dbuf is in, if any */
      249 +        dbuf_cached_state_t db_caching_status;
      250 +
 237  251          /* Data which is unique to data (leaf) blocks: */
 238  252  
 239  253          /* User callback information. */
 240  254          dmu_buf_user_t *db_user;
 241  255  
 242  256          /*
 243  257           * Evict user data as soon as the dirty and reference
 244  258           * counts are equal.
 245  259           */
 246  260          uint8_t db_user_immediate_evict;
↓ open down ↓ 8 lines elided ↑ open up ↑
 255  269           * dnode_evict_dbufs() or dnode_evict_bonus() tried to
 256  270           * evict this dbuf, but couldn't due to outstanding
 257  271           * references.  Evict once the refcount drops to 0.
 258  272           */
 259  273          uint8_t db_pending_evict;
 260  274  
 261  275          uint8_t db_dirtycnt;
 262  276  } dmu_buf_impl_t;
 263  277  
 264  278  /* Note: the dbuf hash table is exposed only for the mdb module */
 265      -#define DBUF_MUTEXES 256
 266      -#define DBUF_HASH_MUTEX(h, idx) (&(h)->hash_mutexes[(idx) & (DBUF_MUTEXES-1)])
      279 +#define DBUF_MUTEXES    256
      280 +#define DBUF_LOCK_PAD   64
      281 +typedef struct {
      282 +        kmutex_t mtx;
      283 +#ifdef _KERNEL
      284 +        unsigned char pad[(DBUF_LOCK_PAD - sizeof (kmutex_t))];
      285 +#endif
      286 +} dbuf_mutex_t;
      287 +#define DBUF_HASH_MUTEX(h, idx) \
      288 +        (&((h)->hash_mutexes[(idx) & (DBUF_MUTEXES-1)].mtx))
 267  289  typedef struct dbuf_hash_table {
 268  290          uint64_t hash_table_mask;
 269  291          dmu_buf_impl_t **hash_table;
 270      -        kmutex_t hash_mutexes[DBUF_MUTEXES];
      292 +        dbuf_mutex_t hash_mutexes[DBUF_MUTEXES];
 271  293  } dbuf_hash_table_t;
 272  294  
 273  295  uint64_t dbuf_whichblock(struct dnode *di, int64_t level, uint64_t offset);
 274  296  
 275  297  dmu_buf_impl_t *dbuf_create_tlib(struct dnode *dn, char *data);
 276  298  void dbuf_create_bonus(struct dnode *dn);
 277  299  int dbuf_spill_set_blksz(dmu_buf_t *db, uint64_t blksz, dmu_tx_t *tx);
 278  300  void dbuf_spill_hold(struct dnode *dn, dmu_buf_impl_t **dbp, void *tag);
 279  301  
 280  302  void dbuf_rm_spill(struct dnode *dn, dmu_tx_t *tx);
↓ open down ↓ 18 lines elided ↑ open up ↑
 299  321  
 300  322  dmu_buf_impl_t *dbuf_find(struct objset *os, uint64_t object, uint8_t level,
 301  323      uint64_t blkid);
 302  324  
 303  325  int dbuf_read(dmu_buf_impl_t *db, zio_t *zio, uint32_t flags);
 304  326  void dmu_buf_will_not_fill(dmu_buf_t *db, dmu_tx_t *tx);
 305  327  void dmu_buf_will_fill(dmu_buf_t *db, dmu_tx_t *tx);
 306  328  void dmu_buf_fill_done(dmu_buf_t *db, dmu_tx_t *tx);
 307  329  void dbuf_assign_arcbuf(dmu_buf_impl_t *db, arc_buf_t *buf, dmu_tx_t *tx);
 308  330  dbuf_dirty_record_t *dbuf_dirty(dmu_buf_impl_t *db, dmu_tx_t *tx);
      331 +dbuf_dirty_record_t *dbuf_dirty_sc(dmu_buf_impl_t *db, dmu_tx_t *tx,
      332 +    boolean_t usesc);
 309  333  arc_buf_t *dbuf_loan_arcbuf(dmu_buf_impl_t *db);
 310  334  void dmu_buf_write_embedded(dmu_buf_t *dbuf, void *data,
 311  335      bp_embedded_type_t etype, enum zio_compress comp,
 312  336      int uncompressed_size, int compressed_size, int byteorder, dmu_tx_t *tx);
 313  337  
 314  338  void dbuf_destroy(dmu_buf_impl_t *db);
 315  339  
 316  340  void dbuf_setdirty(dmu_buf_impl_t *db, dmu_tx_t *tx);
 317  341  void dbuf_unoverride(dbuf_dirty_record_t *dr);
 318  342  void dbuf_sync_list(list_t *list, int level, dmu_tx_t *tx);
 319  343  void dbuf_release_bp(dmu_buf_impl_t *db);
 320  344  
 321      -boolean_t dbuf_can_remap(const dmu_buf_impl_t *buf);
 322      -
 323  345  void dbuf_free_range(struct dnode *dn, uint64_t start, uint64_t end,
 324  346      struct dmu_tx *);
 325  347  
 326  348  void dbuf_new_size(dmu_buf_impl_t *db, int size, dmu_tx_t *tx);
 327  349  
 328  350  #define DB_DNODE(_db)           ((_db)->db_dnode_handle->dnh_dnode)
 329  351  #define DB_DNODE_LOCK(_db)      ((_db)->db_dnode_handle->dnh_zrlock)
 330  352  #define DB_DNODE_ENTER(_db)     (zrl_add(&DB_DNODE_LOCK(_db)))
 331  353  #define DB_DNODE_EXIT(_db)      (zrl_remove(&DB_DNODE_LOCK(_db)))
 332  354  #define DB_DNODE_HELD(_db)      (!zrl_is_zero(&DB_DNODE_LOCK(_db)))
 333  355  
 334  356  void dbuf_init(void);
 335  357  void dbuf_fini(void);
 336  358  
 337  359  boolean_t dbuf_is_metadata(dmu_buf_impl_t *db);
      360 +boolean_t dbuf_is_ddt(dmu_buf_impl_t *db);
      361 +boolean_t dbuf_ddt_is_l2cacheable(dmu_buf_impl_t *db);
      362 +boolean_t dbuf_meta_is_l2cacheable(dmu_buf_impl_t *db);
 338  363  
 339  364  #define DBUF_GET_BUFC_TYPE(_db) \
 340      -        (dbuf_is_metadata(_db) ? ARC_BUFC_METADATA : ARC_BUFC_DATA)
      365 +        (dbuf_is_ddt(_db) ? ARC_BUFC_DDT :\
      366 +        (dbuf_is_metadata(_db) ? ARC_BUFC_METADATA : ARC_BUFC_DATA))
 341  367  
 342  368  #define DBUF_IS_CACHEABLE(_db)                                          \
 343  369          ((_db)->db_objset->os_primary_cache == ZFS_CACHE_ALL ||         \
 344  370          (dbuf_is_metadata(_db) &&                                       \
 345  371          ((_db)->db_objset->os_primary_cache == ZFS_CACHE_METADATA)))
 346  372  
      373 +/*
      374 + * Checks whether we need to cache dbuf in l2arc.
      375 + * Metadata is l2cacheable if it is not placed on special device
      376 + * or it is placed on special device in "dual" mode. We need to check
      377 + * for ddt in ZFS_CACHE_ALL and ZFS_CACHE_METADATA because it is in MOS.
      378 + * ZFS_CACHE_DATA mode actually means to cache both data and cacheable
      379 + * metadata.
      380 + */
 347  381  #define DBUF_IS_L2CACHEABLE(_db)                                        \
 348      -        ((_db)->db_objset->os_secondary_cache == ZFS_CACHE_ALL ||       \
 349      -        (dbuf_is_metadata(_db) &&                                       \
 350      -        ((_db)->db_objset->os_secondary_cache == ZFS_CACHE_METADATA)))
      382 +        (((_db)->db_objset->os_secondary_cache == ZFS_CACHE_ALL &&      \
      383 +        (dbuf_ddt_is_l2cacheable(_db) == B_TRUE)) ||                    \
      384 +        ((_db)->db_objset->os_secondary_cache == ZFS_CACHE_METADATA &&  \
      385 +        (dbuf_is_metadata(_db)) &&                                      \
      386 +        (dbuf_ddt_is_l2cacheable(_db) == B_TRUE)) ||                    \
      387 +        ((dbuf_meta_is_l2cacheable(_db) == B_TRUE) &&                   \
      388 +        ((_db)->db_objset->os_secondary_cache == ZFS_CACHE_DATA)))
 351  389  
 352  390  #define DNODE_LEVEL_IS_L2CACHEABLE(_dn, _level)                         \
 353  391          ((_dn)->dn_objset->os_secondary_cache == ZFS_CACHE_ALL ||       \
 354  392          (((_level) > 0 ||                                               \
 355  393          DMU_OT_IS_METADATA((_dn)->dn_handle->dnh_dnode->dn_type)) &&    \
 356  394          ((_dn)->dn_objset->os_secondary_cache == ZFS_CACHE_METADATA)))
 357  395  
 358  396  #ifdef ZFS_DEBUG
 359  397  
 360  398  /*
↓ open down ↓ 45 lines elided ↑ open up ↑
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX