Print this page
NEX-9200 Improve the scalability of attribute locking in zfs_zget
Reviewed by: Joyce McIntosh <joyce.mcintosh@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
NEX-9552 zfs_scan_idle throttling harms performance and needs to be removed
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
NEX-13140 DVA-throttle support for special-class
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
NEX-13937 Improve kstat performance
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Evan Layton <evan.layton@nexenta.com>
NEX-6088 ZFS scrub/resilver take excessively long due to issuing lots of random IO
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
NEX-8711 backport illumos 7136 ESC_VDEV_REMOVE_AUX ought to always include vdev information
Reviewed by: Alek Pinchuk <alek@nexenta.com>
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
7136 ESC_VDEV_REMOVE_AUX ought to always include vdev information
7115 6922 generates ESC_ZFS_VDEV_REMOVE_AUX a bit too often
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net>
Approved by: Robert Mustacchi <rm@joyent.com>
NEX-6884 KRRP: replication deadlock due to unavailable resources
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
NEX-5856 ddt_capped isn't reset when deduped dataset is destroyed
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
NEX-5553 ZFS auto-trim, manual-trim and scrub can race and deadlock
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Rob Gittins <rob.gittins@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
NEX-5795 Rename 'wrc' as 'wbc' in the source and in the tech docs
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-5367 special vdev: sync-write options (NEW)
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-5318 Cleanup specialclass property (obsolete, not used) and fix related meta-to-special case
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-5255 speed-up migration of the write-cached data
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-5064 On-demand trim should store operation start and stop time
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-5219 WBC: Add capability to delay migration
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-5186 smf-tests contains built files and it shouldn't
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Steve Peng <steve.peng@nexenta.com>
NEX-5168 cleanup and productize non-default latency based writecache load-balancer
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-4940 Special Vdev operation in presence (or absense) of IO Errors
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
NEX-4934 Add capability to remove special vdev
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-4807 writecache load-balancing statistics: several distinct problems, must be revisited and revised
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-4876 On-demand TRIM shouldn't use system_taskq and should queue jobs
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-4794 Write Back Cache sync and async writes: adjust routing according to watermark limits
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-4620 ZFS autotrim triggering is unreliable
NEX-4622 On-demand TRIM code illogically enumerates metaslabs via mg_ms_tree
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Reviewed by: Hans Rosenfeld <hans.rosenfeld@nexenta.com>
NEX-4619 Want kstats to monitor TRIM and UNMAP operation
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Hans Rosenfeld <hans.rosenfeld@nexenta.com>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
4185 add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R (fix studio build)
4185 add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Approved by: Garrett D'Amore <garrett@damore.org>
5818 zfs {ref}compressratio is incorrect with 4k sector size
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Richard Elling <richard.elling@richardelling.com>
Reviewed by: Steven Hartland <killing@multiplay.co.uk>
Reviewed by: Don Brady <dev.fs.zfs@gmail.com>
Approved by: Albert Lee <trisk@omniti.com>
NEX-4476 WRC: Allow to use write back cache per tree of datasets
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
Revert "NEX-4476 WRC: Allow to use write back cache per tree of datasets"
This reverts commit fe97b74444278a6f36fec93179133641296312da.
NEX-4476 WRC: Allow to use write back cache per tree of datasets
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
NEX-3502 dedup ceiling should set a pool prop when cap is in effect
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
NEX-3984 On-demand TRIM
Reviewed by: Alek Pinchuk <alek@nexenta.com>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Conflicts:
        usr/src/common/zfs/zpool_prop.c
        usr/src/uts/common/sys/fs/zfs.h
NEX-3558 KRRP Integration
NEX-3508 CLONE - Port NEX-2946 Add UNMAP/TRIM functionality to ZFS and illumos
Reviewed by: Josef Sipek <josef.sipek@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Conflicts:
    usr/src/uts/common/io/scsi/targets/sd.c
    usr/src/uts/common/sys/scsi/targets/sddef.h
NEX-3165 need some dedup improvements
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
4370 avoid transmitting holes during zfs send
4371 DMU code clean up
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net>
Approved by: Garrett D'Amore <garrett@damore.org>
SUP-577 deadlock between zpool detach and syseventd
OS-80 support for vdev and CoS properties for the new I/O scheduler
OS-95 lint warning introduced by OS-61
Issue #27: Auto best-effort dedup enable/disable - settable per pool
Issues #7: Reconsile L2ARC and "special" use by datasets
Issue #2: optimize DDE lookup in DDT objects
Added option to control number of classes of DDE's in DDT.
New default is one, that is all DDE's are stored together
regardless of refcount.
Issue #3: Add support for parametrized number of copies for DDTs
Issue #25: Add a pool-level property that controls the number of copies of DDTs in the pool.
re #12643 rb4064 ZFS meta refactoring - vdev utilization tracking, auto-dedup
re #12585 rb4049 ZFS++ work port - refactoring to improve separation of open/closed code, bug fixes, performance improvements - open code
Bug 11205: add missing libzfs_closed_stubs.c to fix opensource-only build.
ZFS plus work: special vdevs, cos, cos/vdev properties

Split Close
Expand all
Collapse all
          --- old/usr/src/uts/common/fs/zfs/sys/spa_impl.h
          +++ new/usr/src/uts/common/fs/zfs/sys/spa_impl.h
↓ open down ↓ 12 lines elided ↑ open up ↑
  13   13   * When distributing Covered Code, include this CDDL HEADER in each
  14   14   * file and include the License file at usr/src/OPENSOLARIS.LICENSE.
  15   15   * If applicable, add the following below this CDDL HEADER, with the
  16   16   * fields enclosed by brackets "[]" replaced with your own identifying
  17   17   * information: Portions Copyright [yyyy] [name of copyright owner]
  18   18   *
  19   19   * CDDL HEADER END
  20   20   */
  21   21  /*
  22   22   * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
  23      - * Copyright (c) 2011, 2018 by Delphix. All rights reserved.
  24      - * Copyright 2011 Nexenta Systems, Inc.  All rights reserved.
       23 + * Copyright (c) 2011, 2015 by Delphix. All rights reserved.
  25   24   * Copyright (c) 2014 Spectra Logic Corporation, All rights reserved.
       25 + * Copyright 2017 Nexenta Systems, Inc.  All rights reserved.
  26   26   * Copyright 2013 Saso Kiselkov. All rights reserved.
  27   27   * Copyright (c) 2017 Datto Inc.
  28   28   */
  29   29  
  30   30  #ifndef _SYS_SPA_IMPL_H
  31   31  #define _SYS_SPA_IMPL_H
  32   32  
  33   33  #include <sys/spa.h>
  34   34  #include <sys/vdev.h>
  35      -#include <sys/vdev_removal.h>
       35 +#include <sys/vdev_impl.h>
  36   36  #include <sys/metaslab.h>
  37   37  #include <sys/dmu.h>
  38   38  #include <sys/dsl_pool.h>
  39   39  #include <sys/uberblock_impl.h>
  40   40  #include <sys/zfs_context.h>
  41   41  #include <sys/avl.h>
  42   42  #include <sys/refcount.h>
  43   43  #include <sys/bplist.h>
  44   44  #include <sys/bpobj.h>
       45 +#include <sys/special_impl.h>
       46 +#include <sys/wbc.h>
  45   47  #include <sys/zfeature.h>
  46      -#include <sys/zthr.h>
  47   48  #include <zfeature_common.h>
       49 +#include <sys/autosnap.h>
  48   50  
  49   51  #ifdef  __cplusplus
  50   52  extern "C" {
  51   53  #endif
  52   54  
       55 +/*
       56 + * This (illegal) pool name is used when temporarily importing a spa_t in order
       57 + * to get the vdev stats associated with the imported devices.
       58 + */
       59 +#define TRYIMPORT_NAME  "$import"
       60 +
  53   61  typedef struct spa_error_entry {
  54   62          zbookmark_phys_t        se_bookmark;
  55   63          char                    *se_name;
  56   64          avl_node_t              se_avl;
  57   65  } spa_error_entry_t;
  58   66  
  59   67  typedef struct spa_history_phys {
  60   68          uint64_t sh_pool_create_len;    /* ending offset of zpool create */
  61   69          uint64_t sh_phys_max_off;       /* physical EOF */
  62   70          uint64_t sh_bof;                /* logical BOF */
  63   71          uint64_t sh_eof;                /* logical EOF */
  64   72          uint64_t sh_records_lost;       /* num of records overwritten */
  65   73  } spa_history_phys_t;
  66   74  
  67      -/*
  68      - * All members must be uint64_t, for byteswap purposes.
  69      - */
  70      -typedef struct spa_removing_phys {
  71      -        uint64_t sr_state; /* dsl_scan_state_t */
  72      -
  73      -        /*
  74      -         * The vdev ID that we most recently attempted to remove,
  75      -         * or -1 if no removal has been attempted.
  76      -         */
  77      -        uint64_t sr_removing_vdev;
  78      -
  79      -        /*
  80      -         * The vdev ID that we most recently successfully removed,
  81      -         * or -1 if no devices have been removed.
  82      -         */
  83      -        uint64_t sr_prev_indirect_vdev;
  84      -
  85      -        uint64_t sr_start_time;
  86      -        uint64_t sr_end_time;
  87      -
  88      -        /*
  89      -         * Note that we can not use the space map's or indirect mapping's
  90      -         * accounting as a substitute for these values, because we need to
  91      -         * count frees of not-yet-copied data as though it did the copy.
  92      -         * Otherwise, we could get into a situation where copied > to_copy,
  93      -         * or we complete before copied == to_copy.
  94      -         */
  95      -        uint64_t sr_to_copy; /* bytes that need to be copied */
  96      -        uint64_t sr_copied; /* bytes that have been copied or freed */
  97      -} spa_removing_phys_t;
  98      -
  99      -/*
 100      - * This struct is stored as an entry in the DMU_POOL_DIRECTORY_OBJECT
 101      - * (with key DMU_POOL_CONDENSING_INDIRECT).  It is present if a condense
 102      - * of an indirect vdev's mapping object is in progress.
 103      - */
 104      -typedef struct spa_condensing_indirect_phys {
 105      -        /*
 106      -         * The vdev ID of the indirect vdev whose indirect mapping is
 107      -         * being condensed.
 108      -         */
 109      -        uint64_t        scip_vdev;
 110      -
 111      -        /*
 112      -         * The vdev's old obsolete spacemap.  This spacemap's contents are
 113      -         * being integrated into the new mapping.
 114      -         */
 115      -        uint64_t        scip_prev_obsolete_sm_object;
 116      -
 117      -        /*
 118      -         * The new mapping object that is being created.
 119      -         */
 120      -        uint64_t        scip_next_mapping_object;
 121      -} spa_condensing_indirect_phys_t;
 122      -
 123   75  struct spa_aux_vdev {
 124   76          uint64_t        sav_object;             /* MOS object for device list */
 125   77          nvlist_t        *sav_config;            /* cached device config */
 126   78          vdev_t          **sav_vdevs;            /* devices */
 127   79          int             sav_count;              /* number devices */
 128   80          boolean_t       sav_sync;               /* sync the device list */
 129   81          nvlist_t        **sav_pending;          /* pending device additions */
 130   82          uint_t          sav_npending;           /* # pending devices */
 131   83  };
 132   84  
↓ open down ↓ 42 lines elided ↑ open up ↑
 175  127          taskq_t **stqs_taskq;
 176  128  } spa_taskqs_t;
 177  129  
 178  130  typedef enum spa_all_vdev_zap_action {
 179  131          AVZ_ACTION_NONE = 0,
 180  132          AVZ_ACTION_DESTROY,     /* Destroy all per-vdev ZAPs and the AVZ. */
 181  133          AVZ_ACTION_REBUILD,     /* Populate the new AVZ, see spa_avz_rebuild */
 182  134          AVZ_ACTION_INITIALIZE
 183  135  } spa_avz_action_t;
 184  136  
 185      -typedef enum spa_config_source {
 186      -        SPA_CONFIG_SRC_NONE = 0,
 187      -        SPA_CONFIG_SRC_SCAN,            /* scan of path (default: /dev/dsk) */
 188      -        SPA_CONFIG_SRC_CACHEFILE,       /* any cachefile */
 189      -        SPA_CONFIG_SRC_TRYIMPORT,       /* returned from call to tryimport */
 190      -        SPA_CONFIG_SRC_SPLIT,           /* new pool in a pool split */
 191      -        SPA_CONFIG_SRC_MOS              /* MOS, but not always from right txg */
 192      -} spa_config_source_t;
      137 +typedef enum spa_watermark {
      138 +        SPA_WM_NONE,
      139 +        SPA_WM_LOW,
      140 +        SPA_WM_HIGH
      141 +} spa_watermark_t;
 193  142  
      143 +/*
      144 + * average utilization, latency and throughput
      145 + * for spa and special/normal classes
      146 + */
      147 +typedef struct spa_avg_stat {
      148 +        uint64_t spa_utilization;
      149 +        uint64_t special_utilization;
      150 +        uint64_t normal_utilization;
      151 +        uint64_t special_latency;
      152 +        uint64_t normal_latency;
      153 +        uint64_t special_throughput;
      154 +        uint64_t normal_throughput;
      155 +} spa_avg_stat_t;
      156 +
      157 +typedef struct spa_perfmon_data {
      158 +        kthread_t               *perfmon_thread;
      159 +        boolean_t               perfmon_thr_exit;
      160 +        kmutex_t                perfmon_lock;
      161 +        kcondvar_t              perfmon_cv;
      162 +} spa_perfmon_data_t;
      163 +
      164 +/*
      165 + * Metaplacement controls 3-types of meta
      166 + * (see spa_refine_meta_placement() in special.c):
      167 + * - DDT-Meta (pool level property) (see DMU_OT_IS_DDT_META())
      168 + * - ZPL-Meta (dataset level property) (see DMU_OT_IS_ZPL_META())
      169 + * - ZFS-Meta (pool level property) all other metadata except
      170 + * DDT-Meta and ZPL-Meta
      171 + *
      172 + * spa_enable_meta_placement_selection is global switch
      173 + *
      174 + * spa_small_data_to_special contains max size of data that
      175 + * can be placed on special
      176 + *
      177 + * spa_sync_to_special uses special device for slog synchronous transactions
      178 + */
      179 +typedef struct spa_meta_placement {
      180 +        uint64_t spa_enable_meta_placement_selection;
      181 +        uint64_t spa_ddt_meta_to_special;
      182 +        uint64_t spa_zfs_meta_to_special;
      183 +        uint64_t spa_small_data_to_special;
      184 +        uint64_t spa_sync_to_special;
      185 +} spa_meta_placement_t;
      186 +
      187 +typedef struct spa_trimstats spa_trimstats_t;
      188 +
 194  189  struct spa {
 195  190          /*
 196  191           * Fields protected by spa_namespace_lock.
 197  192           */
 198  193          char            spa_name[ZFS_MAX_DATASET_NAME_LEN];     /* pool name */
 199  194          char            *spa_comment;           /* comment */
 200  195          avl_node_t      spa_avl;                /* node in spa_namespace_avl */
 201  196          nvlist_t        *spa_config;            /* last synced config */
 202  197          nvlist_t        *spa_config_syncing;    /* currently syncing config */
 203  198          nvlist_t        *spa_config_splitting;  /* config for splitting */
 204  199          nvlist_t        *spa_load_info;         /* info and errors from load */
 205  200          uint64_t        spa_config_txg;         /* txg of last config change */
 206  201          int             spa_sync_pass;          /* iterate-to-convergence */
 207  202          pool_state_t    spa_state;              /* pool state */
 208  203          int             spa_inject_ref;         /* injection references */
 209  204          uint8_t         spa_sync_on;            /* sync threads are running */
 210  205          spa_load_state_t spa_load_state;        /* current load operation */
 211      -        boolean_t       spa_indirect_vdevs_loaded; /* mappings loaded? */
 212      -        boolean_t       spa_trust_config;       /* do we trust vdev tree? */
 213      -        spa_config_source_t spa_config_source;  /* where config comes from? */
 214  206          uint64_t        spa_import_flags;       /* import specific flags */
 215  207          spa_taskqs_t    spa_zio_taskq[ZIO_TYPES][ZIO_TASKQ_TYPES];
 216  208          dsl_pool_t      *spa_dsl_pool;
 217  209          boolean_t       spa_is_initializing;    /* true while opening pool */
 218  210          metaslab_class_t *spa_normal_class;     /* normal data class */
 219  211          metaslab_class_t *spa_log_class;        /* intent log data class */
      212 +        metaslab_class_t *spa_special_class;    /* special usage class */
 220  213          uint64_t        spa_first_txg;          /* first txg after spa_open() */
 221  214          uint64_t        spa_final_txg;          /* txg of export/destroy */
 222  215          uint64_t        spa_freeze_txg;         /* freeze pool at this txg */
 223  216          uint64_t        spa_load_max_txg;       /* best initial ub_txg */
 224  217          uint64_t        spa_claim_max_txg;      /* highest claimed birth txg */
 225  218          timespec_t      spa_loaded_ts;          /* 1st successful open time */
 226  219          objset_t        *spa_meta_objset;       /* copy of dp->dp_meta_objset */
 227  220          kmutex_t        spa_evicting_os_lock;   /* Evicting objset list lock */
 228  221          list_t          spa_evicting_os_list;   /* Objsets being evicted. */
 229  222          kcondvar_t      spa_evicting_os_cv;     /* Objset Eviction Completion */
 230  223          txg_list_t      spa_vdev_txg_list;      /* per-txg dirty vdev list */
 231  224          vdev_t          *spa_root_vdev;         /* top-level vdev container */
 232  225          int             spa_min_ashift;         /* of vdevs in normal class */
 233  226          int             spa_max_ashift;         /* of vdevs in normal class */
 234  227          uint64_t        spa_config_guid;        /* config pool guid */
 235  228          uint64_t        spa_load_guid;          /* spa_load initialized guid */
 236  229          uint64_t        spa_last_synced_guid;   /* last synced guid */
 237  230          list_t          spa_config_dirty_list;  /* vdevs with dirty config */
 238  231          list_t          spa_state_dirty_list;   /* vdevs with dirty state */
 239      -        kmutex_t        spa_alloc_lock;
 240      -        avl_tree_t      spa_alloc_tree;
 241  232          spa_aux_vdev_t  spa_spares;             /* hot spares */
 242  233          spa_aux_vdev_t  spa_l2cache;            /* L2ARC cache devices */
 243  234          nvlist_t        *spa_label_features;    /* Features for reading MOS */
 244  235          uint64_t        spa_config_object;      /* MOS object for pool config */
 245  236          uint64_t        spa_config_generation;  /* config generation number */
 246  237          uint64_t        spa_syncing_txg;        /* txg currently syncing */
 247  238          bpobj_t         spa_deferred_bpobj;     /* deferred-free bplist */
 248  239          bplist_t        spa_free_bplist[TXG_SIZE]; /* bplist of stuff to free */
 249  240          zio_cksum_salt_t spa_cksum_salt;        /* secret salt for cksum */
 250  241          /* checksum context templates */
 251  242          kmutex_t        spa_cksum_tmpls_lock;
 252  243          void            *spa_cksum_tmpls[ZIO_CHECKSUM_FUNCTIONS];
 253  244          uberblock_t     spa_ubsync;             /* last synced uberblock */
 254  245          uberblock_t     spa_uberblock;          /* current uberblock */
 255  246          boolean_t       spa_extreme_rewind;     /* rewind past deferred frees */
 256      -        uint64_t        spa_last_io;            /* lbolt of last non-scan I/O */
 257  247          kmutex_t        spa_scrub_lock;         /* resilver/scrub lock */
 258  248          uint64_t        spa_scrub_inflight;     /* in-flight scrub I/Os */
 259  249          kcondvar_t      spa_scrub_io_cv;        /* scrub I/O completion */
 260  250          uint8_t         spa_scrub_active;       /* active or suspended? */
 261  251          uint8_t         spa_scrub_type;         /* type of scrub we're doing */
 262  252          uint8_t         spa_scrub_finished;     /* indicator to rotate logs */
 263  253          uint8_t         spa_scrub_started;      /* started since last boot */
 264  254          uint8_t         spa_scrub_reopen;       /* scrub doing vdev_reopen */
 265  255          uint64_t        spa_scan_pass_start;    /* start time per pass/reboot */
 266  256          uint64_t        spa_scan_pass_scrub_pause; /* scrub pause time */
 267  257          uint64_t        spa_scan_pass_scrub_spent_paused; /* total paused */
 268  258          uint64_t        spa_scan_pass_exam;     /* examined bytes per pass */
      259 +        uint64_t        spa_scan_pass_work;     /* actually processed bytes */
 269  260          kmutex_t        spa_async_lock;         /* protect async state */
 270  261          kthread_t       *spa_async_thread;      /* thread doing async task */
 271  262          int             spa_async_suspended;    /* async tasks suspended */
 272  263          kcondvar_t      spa_async_cv;           /* wait for thread_exit() */
 273  264          uint16_t        spa_async_tasks;        /* async task mask */
 274      -        uint64_t        spa_missing_tvds;       /* unopenable tvds on load */
 275      -        uint64_t        spa_missing_tvds_allowed; /* allow loading spa? */
 276      -
 277      -        spa_removing_phys_t spa_removing_phys;
 278      -        spa_vdev_removal_t *spa_vdev_removal;
 279      -
 280      -        spa_condensing_indirect_phys_t  spa_condensing_indirect_phys;
 281      -        spa_condensing_indirect_t       *spa_condensing_indirect;
 282      -        zthr_t          *spa_condense_zthr;     /* zthr doing condense. */
 283      -
 284  265          char            *spa_root;              /* alternate root directory */
 285  266          uint64_t        spa_ena;                /* spa-wide ereport ENA */
 286  267          int             spa_last_open_failed;   /* error if last open failed */
 287  268          uint64_t        spa_last_ubsync_txg;    /* "best" uberblock txg */
 288  269          uint64_t        spa_last_ubsync_txg_ts; /* timestamp from that ub */
 289  270          uint64_t        spa_load_txg;           /* ub txg that loaded */
 290  271          uint64_t        spa_load_txg_ts;        /* timestamp from that ub */
 291  272          uint64_t        spa_load_meta_errors;   /* verify metadata err count */
 292  273          uint64_t        spa_load_data_errors;   /* verify data err count */
 293  274          uint64_t        spa_verify_min_txg;     /* start txg of verify scrub */
↓ open down ↓ 2 lines elided ↑ open up ↑
 296  277          uint64_t        spa_errlog_scrub;       /* scrub error log object */
 297  278          kmutex_t        spa_errlist_lock;       /* error list/ereport lock */
 298  279          avl_tree_t      spa_errlist_last;       /* last error list */
 299  280          avl_tree_t      spa_errlist_scrub;      /* scrub error list */
 300  281          uint64_t        spa_deflate;            /* should we deflate? */
 301  282          uint64_t        spa_history;            /* history object */
 302  283          kmutex_t        spa_history_lock;       /* history lock */
 303  284          vdev_t          *spa_pending_vdev;      /* pending vdev additions */
 304  285          kmutex_t        spa_props_lock;         /* property lock */
 305  286          uint64_t        spa_pool_props_object;  /* object for properties */
      287 +        kmutex_t        spa_cos_props_lock;     /* property lock */
      288 +        uint64_t        spa_cos_props_object;   /* object for cos properties */
      289 +        kmutex_t        spa_vdev_props_lock;    /* property lock */
      290 +        uint64_t        spa_vdev_props_object;  /* object for vdev properties */
 306  291          uint64_t        spa_bootfs;             /* default boot filesystem */
 307  292          uint64_t        spa_failmode;           /* failure mode for the pool */
 308  293          uint64_t        spa_delegation;         /* delegation on/off */
 309  294          list_t          spa_config_list;        /* previous cache file(s) */
 310  295          /* per-CPU array of root of async I/O: */
 311  296          zio_t           **spa_async_zio_root;
 312  297          zio_t           *spa_suspend_zio_root;  /* root of all suspended I/O */
 313      -        zio_t           *spa_txg_zio[TXG_SIZE]; /* spa_sync() waits for this */
 314  298          kmutex_t        spa_suspend_lock;       /* protects suspend_zio_root */
 315  299          kcondvar_t      spa_suspend_cv;         /* notification of resume */
 316  300          uint8_t         spa_suspended;          /* pool is suspended */
 317  301          uint8_t         spa_claiming;           /* pool is doing zil_claim() */
 318  302          boolean_t       spa_debug;              /* debug enabled? */
 319  303          boolean_t       spa_is_root;            /* pool is root */
 320  304          int             spa_minref;             /* num refs when first opened */
 321  305          int             spa_mode;               /* FREAD | FWRITE */
 322  306          spa_log_state_t spa_log_state;          /* log state */
 323  307          uint64_t        spa_autoexpand;         /* lun expansion on/off */
 324  308          uint64_t        spa_bootsize;           /* efi system partition size */
 325  309          ddt_t           *spa_ddt[ZIO_CHECKSUM_FUNCTIONS]; /* in-core DDTs */
 326  310          uint64_t        spa_ddt_stat_object;    /* DDT statistics */
 327  311          uint64_t        spa_dedup_ditto;        /* dedup ditto threshold */
 328  312          uint64_t        spa_dedup_checksum;     /* default dedup checksum */
      313 +        uint64_t        spa_ddt_msize;          /* ddt size in core, from ddo */
      314 +        uint64_t        spa_ddt_dsize;          /* ddt size on disk, from ddo */
 329  315          uint64_t        spa_dspace;             /* dspace in normal class */
 330  316          kmutex_t        spa_vdev_top_lock;      /* dueling offline/remove */
 331  317          kmutex_t        spa_proc_lock;          /* protects spa_proc* */
 332  318          kcondvar_t      spa_proc_cv;            /* spa_proc_state transitions */
 333  319          spa_proc_state_t spa_proc_state;        /* see definition */
 334  320          struct proc     *spa_proc;              /* "zpool-poolname" process */
 335  321          uint64_t        spa_did;                /* if procp != p0, did of t1 */
 336  322          boolean_t       spa_autoreplace;        /* autoreplace set in open */
 337  323          int             spa_vdev_locks;         /* locks grabbed */
 338  324          uint64_t        spa_creation_version;   /* version at pool creation */
↓ open down ↓ 4 lines elided ↑ open up ↑
 343  329          uint64_t        spa_feat_enabled_txg_obj; /* Feature enabled txg */
 344  330          /* cache feature refcounts */
 345  331          uint64_t        spa_feat_refcount_cache[SPA_FEATURES];
 346  332          cyclic_id_t     spa_deadman_cycid;      /* cyclic id */
 347  333          uint64_t        spa_deadman_calls;      /* number of deadman calls */
 348  334          hrtime_t        spa_sync_starttime;     /* starting time fo spa_sync */
 349  335          uint64_t        spa_deadman_synctime;   /* deadman expiration timer */
 350  336          uint64_t        spa_all_vdev_zaps;      /* ZAP of per-vd ZAP obj #s */
 351  337          spa_avz_action_t        spa_avz_action; /* destroy/rebuild AVZ? */
 352  338  
      339 +        /* TRIM */
      340 +        uint64_t        spa_force_trim;         /* force sending trim? */
      341 +        uint64_t        spa_auto_trim;          /* see spa_auto_trim_t */
      342 +
      343 +        kmutex_t        spa_auto_trim_lock;
      344 +        kcondvar_t      spa_auto_trim_done_cv;  /* all autotrim thrd's exited */
      345 +        uint64_t        spa_num_auto_trimming;  /* # of autotrim threads */
      346 +        taskq_t         *spa_auto_trim_taskq;
      347 +
      348 +        kmutex_t        spa_man_trim_lock;
      349 +        uint64_t        spa_man_trim_rate;      /* rate of trim in bytes/sec */
      350 +        uint64_t        spa_num_man_trimming;   /* # of manual trim threads */
      351 +        boolean_t       spa_man_trim_stop;      /* requested manual trim stop */
      352 +        kcondvar_t      spa_man_trim_update_cv; /* updates to TRIM settings */
      353 +        kcondvar_t      spa_man_trim_done_cv;   /* manual trim has completed */
      354 +        /* For details on trim start/stop times see spa_get_trim_prog. */
      355 +        uint64_t        spa_man_trim_start_time;
      356 +        uint64_t        spa_man_trim_stop_time;
      357 +        taskq_t         *spa_man_trim_taskq;
      358 +
 353  359          /*
 354  360           * spa_iokstat_lock protects spa_iokstat and
 355  361           * spa_queue_stats[].
 356  362           */
 357  363          kmutex_t        spa_iokstat_lock;
 358  364          struct kstat    *spa_iokstat;           /* kstat of io to this pool */
 359  365          struct {
 360      -                int spa_active;
 361      -                int spa_queued;
      366 +                uint64_t spa_active;
      367 +                uint64_t spa_queued;
 362  368          } spa_queue_stats[ZIO_PRIORITY_NUM_QUEUEABLE];
 363  369  
      370 +        /* Pool-wide scrub & resilver priority values. */
      371 +        uint64_t        spa_scrub_prio;
      372 +        uint64_t        spa_resilver_prio;
      373 +
      374 +        /* TRIM/UNMAP kstats */
      375 +        spa_trimstats_t *spa_trimstats;         /* alloc'd by kstat_create */
      376 +        struct kstat    *spa_trimstats_ks;
      377 +
 364  378          hrtime_t        spa_ccw_fail_time;      /* Conf cache write fail time */
 365  379  
      380 +        /* total space on all L2ARC devices used for DDT (l2arc_ddt=on) */
      381 +        uint64_t spa_l2arc_ddt_devs_size;
      382 +
      383 +        /* if 1 this means we have stopped DDT growth for this pool */
      384 +        uint8_t spa_ddt_capped;
      385 +
      386 +        /* specialclass support */
      387 +        boolean_t       spa_usesc;              /* enable special class */
      388 +        uint64_t        spa_special_vdev_correction_rate;
      389 +        uint64_t        spa_minwat;             /* min watermark percent */
      390 +        uint64_t        spa_lowat;              /* low watermark percent */
      391 +        uint64_t        spa_hiwat;              /* high watermark percent */
      392 +        uint64_t        spa_lwm_space;          /* low watermark */
      393 +        uint64_t        spa_hwm_space;          /* high watermark */
      394 +        uint64_t        spa_wbc_wm_range;       /* high wm - low wm */
      395 +        uint8_t         spa_wbc_perc;           /* percent of writes to spec. */
      396 +        spa_watermark_t spa_watermark;
      397 +        boolean_t       spa_special_has_errors;
      398 +
      399 +        /* Write Back Cache */
      400 +        uint64_t        spa_wbc_mode;
      401 +        wbc_data_t      spa_wbc;
      402 +
      403 +        /* cos list */
      404 +        list_t          spa_cos_list;
      405 +
 366  406          /*
 367      -         * spa_refcount & spa_config_lock must be the last elements
      407 +         * utilization, latency and throughput statistics per metaslab_class
      408 +         * to aid dynamic balancing of I/O across normal and special classes
      409 +         */
      410 +        uint64_t                spa_avg_stat_rotor;
      411 +        spa_avg_stat_t          spa_avg_stat;
      412 +
      413 +        spa_perfmon_data_t      spa_perfmon;
      414 +
      415 +        /*
      416 +         * Percentage of total write traffic routed to the special class when
      417 +         * the latter is working as writeback cache.
      418 +         * Note that this value is continuously recomputed at runtime based on
      419 +         * the configured load-balancing mechanism (see spa_special_selection)
      420 +         * For instance, 0% would mean that special class is not to be used
      421 +         * for new writes, etc.
      422 +         */
      423 +        uint64_t spa_special_to_normal_ratio;
      424 +
      425 +        /*
      426 +         * last re-routing delta value for the spa_special_to_normal_ratio
      427 +         */
      428 +        int64_t spa_special_to_normal_delta;
      429 +
      430 +        /* target percentage of data to be considered for dedup */
      431 +        int spa_dedup_percentage;
      432 +        uint64_t spa_dedup_rotor;
      433 +
      434 +        /*
      435 +         * spa_refcnt & spa_config_lock must be the last elements
 368  436           * because refcount_t changes size based on compilation options.
 369  437           * In order for the MDB module to function correctly, the other
 370  438           * fields must remain in the same location.
 371  439           */
 372  440          spa_config_lock_t spa_config_lock[SCL_LOCKS]; /* config changes */
 373  441          refcount_t      spa_refcount;           /* number of opens */
      442 +
      443 +        uint64_t spa_ddt_meta_copies; /* amount of ddt-metadata copies */
      444 +
      445 +        /*
      446 +         * The following two fields are designed to restrict the distribution
      447 +         * of the deduplication entries. There are two possible states of these
      448 +         * vars:
      449 +         * 1) min=DITTO, max=DUPLICATED - it provides the old behavior
      450 +         * 2) min=DUPLICATED, MAX=DUPLICATED - new behavior: all entries into
      451 +         * the single zap.
      452 +         */
      453 +        enum ddt_class spa_ddt_class_min;
      454 +        enum ddt_class spa_ddt_class_max;
      455 +
      456 +        spa_meta_placement_t spa_meta_policy;
      457 +
      458 +        uint64_t spa_dedup_best_effort;
      459 +        uint64_t spa_dedup_lo_best_effort;
      460 +        uint64_t spa_dedup_hi_best_effort;
      461 +
      462 +        zfs_autosnap_t spa_autosnap;
      463 +
      464 +        zbookmark_phys_t spa_lszb;
      465 +
      466 +        int spa_obj_mtx_sz;
 374  467  };
 375  468  
      469 +/* possible in core size of all DDTs  */
      470 +extern uint64_t zfs_ddts_msize;
      471 +
      472 +/* spa sysevent taskq */
      473 +extern taskq_t *spa_sysevent_taskq;
      474 +
 376  475  extern const char *spa_config_path;
 377  476  
 378  477  extern void spa_taskq_dispatch_ent(spa_t *spa, zio_type_t t, zio_taskq_type_t q,
 379  478      task_func_t *func, void *arg, uint_t flags, taskq_ent_t *ent);
 380      -extern void spa_load_spares(spa_t *spa);
 381      -extern void spa_load_l2cache(spa_t *spa);
 382  479  
      480 +extern void spa_auto_trim_taskq_create(spa_t *spa);
      481 +extern void spa_man_trim_taskq_create(spa_t *spa);
      482 +extern void spa_auto_trim_taskq_destroy(spa_t *spa);
      483 +extern void spa_man_trim_taskq_destroy(spa_t *spa);
      484 +
 383  485  #ifdef  __cplusplus
 384  486  }
 385  487  #endif
 386  488  
 387  489  #endif  /* _SYS_SPA_IMPL_H */
    
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX