Print this page
NEX-9752 backport illumos 6950 ARC should cache compressed data
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
6950 ARC should cache compressed data
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Don Brady <don.brady@intel.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
NEX-7603 Back port OpenZFS #188 Create tunable to ignore hole_birth
feature
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Dan Fields <dan.fields@nexenta.com>
2605 want to resume interrupted zfs send
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed by: Xin Li <delphij@freebsd.org>
Reviewed by: Arne Jansen <sensille@gmx.net>
Approved by: Dan McDonald <danmcd@omniti.com>
NEX-4582 update wrc test cases for allow to use write back cache per tree of datasets
Reviewed by: Steve Peng <steve.peng@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
5960 zfs recv should prefetch indirect blocks
5925 zfs receive -o origin=
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
NEX-4207 WRC and dedup on the same pool cause system-panic
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
NEX-4193 WRC does not migrate data that belong to intermediate snapshots
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
NEX-3710 WRC improvements and bug-fixes
 * refactored WRC move-logic to use zio kmem_cashes
 * replace size and compression fields by blk_prop field
   (the same in blkptr_t) to little reduce size of wrc_block_t
   and use similar macros as for blkptr_t to get PSIZE, LSIZE
   and COMPRESSION
 * make CPU more happy by reduce atomic calls
 * removed unused code
 * fixed naming of variables
 * fixed possible system panic after restart system
   with enabled WRC
 * fixed a race that causes system panic
Reviewed by: Alek Pinchuk <alek@nexenta.com>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
NEX-3558 KRRP Integration
4459 Typo in itadm(1m) usage message: delete-inititator
Reviewed by: Milan Jurik <milan.jurik@xylab.cz>
Reviewed by: Marcel Telka <marcel@telka.sk>
Approved by: Robert Mustacchi <rm@joyent.com>
4504 traverse_visitbp: visit DMU_GROUPUSED_OBJECT before DMU_USERUSED_OBJECT
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Andriy Gapon <andriy.gapon@hybridcluster.com>
Reviewed by: Saso Kiselkov <skiselkov.ml@gmail.com>
Approved by: Robert Mustacchi <rm@joyent.com>
4391 panic system rather than corrupting pool if we hit bug 4390
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Approved by: Gordon Ross <gwr@nexenta.com>
4370 avoid transmitting holes during zfs send
4371 DMU code clean up
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net>
Approved by: Garrett D'Amore <garrett@damore.org>
re #12619 rb4429 More dp->dp_config_rwlock holds
re #12585 rb4049 ZFS++ work port - refactoring to improve separation of open/closed code, bug fixes, performance improvements - open code
Bug 11205: add missing libzfs_closed_stubs.c to fix opensource-only build.
ZFS plus work: special vdevs, cos, cos/vdev properties

Split Close
Expand all
Collapse all
          --- old/usr/src/uts/common/fs/zfs/dmu_traverse.c
          +++ new/usr/src/uts/common/fs/zfs/dmu_traverse.c
↓ open down ↓ 12 lines elided ↑ open up ↑
  13   13   * When distributing Covered Code, include this CDDL HEADER in each
  14   14   * file and include the License file at usr/src/OPENSOLARIS.LICENSE.
  15   15   * If applicable, add the following below this CDDL HEADER, with the
  16   16   * fields enclosed by brackets "[]" replaced with your own identifying
  17   17   * information: Portions Copyright [yyyy] [name of copyright owner]
  18   18   *
  19   19   * CDDL HEADER END
  20   20   */
  21   21  /*
  22   22   * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
       23 + * Copyright 2015 Nexenta Systems, Inc. All rights reserved.
  23   24   * Copyright (c) 2012, 2016 by Delphix. All rights reserved.
  24   25   */
  25   26  
  26   27  #include <sys/zfs_context.h>
  27   28  #include <sys/dmu_objset.h>
  28   29  #include <sys/dmu_traverse.h>
  29   30  #include <sys/dsl_dataset.h>
  30   31  #include <sys/dsl_dir.h>
  31   32  #include <sys/dsl_pool.h>
  32   33  #include <sys/dnode.h>
↓ open down ↓ 16 lines elided ↑ open up ↑
  49   50          boolean_t pd_cancel;
  50   51          boolean_t pd_exited;
  51   52          zbookmark_phys_t pd_resume;
  52   53  } prefetch_data_t;
  53   54  
  54   55  typedef struct traverse_data {
  55   56          spa_t *td_spa;
  56   57          uint64_t td_objset;
  57   58          blkptr_t *td_rootbp;
  58   59          uint64_t td_min_txg;
       60 +        uint64_t td_max_txg;
  59   61          zbookmark_phys_t *td_resume;
  60   62          int td_flags;
  61   63          prefetch_data_t *td_pfd;
  62   64          boolean_t td_paused;
  63   65          uint64_t td_hole_birth_enabled_txg;
  64   66          blkptr_cb_t *td_func;
  65   67          void *td_arg;
  66   68          boolean_t td_realloc_possible;
  67   69  } traverse_data_t;
  68   70  
↓ open down ↓ 115 lines elided ↑ open up ↑
 184  186  
 185  187          if (!(td->td_flags & TRAVERSE_PREFETCH_METADATA))
 186  188                  return;
 187  189          /*
 188  190           * If we are in the process of resuming, don't prefetch, because
 189  191           * some children will not be needed (and in fact may have already
 190  192           * been freed).
 191  193           */
 192  194          if (td->td_resume != NULL && !ZB_IS_ZERO(td->td_resume))
 193  195                  return;
 194      -        if (BP_IS_HOLE(bp) || bp->blk_birth <= td->td_min_txg)
      196 +        if (BP_IS_HOLE(bp) || bp->blk_birth <= td->td_min_txg ||
      197 +            bp->blk_birth >= td->td_max_txg)
 195  198                  return;
 196  199          if (BP_GET_LEVEL(bp) == 0 && BP_GET_TYPE(bp) != DMU_OT_DNODE)
 197  200                  return;
 198  201  
 199  202          (void) arc_read(NULL, td->td_spa, bp, NULL, NULL,
 200  203              ZIO_PRIORITY_ASYNC_READ, ZIO_FLAG_CANFAIL, &flags, zb);
 201  204  }
 202  205  
 203  206  static boolean_t
 204  207  prefetch_needed(prefetch_data_t *pfd, const blkptr_t *bp)
↓ open down ↓ 45 lines elided ↑ open up ↑
 250  253                   * then if SPA_FEATURE_HOLE_BIRTH was enabled before we wrote
 251  254                   * all the blocks we will visit as part of this traversal,
 252  255                   * then this hole must have always existed, so we can skip
 253  256                   * it.  We visit blocks born after (exclusive) td_min_txg.
 254  257                   *
 255  258                   * Note that the meta-dnode cannot be reallocated.
 256  259                   */
 257  260                  if (!send_holes_without_birth_time &&
 258  261                      (!td->td_realloc_possible ||
 259  262                      zb->zb_object == DMU_META_DNODE_OBJECT) &&
 260      -                    td->td_hole_birth_enabled_txg <= td->td_min_txg)
      263 +                    (td->td_hole_birth_enabled_txg <= td->td_min_txg ||
      264 +                    td->td_hole_birth_enabled_txg > td->td_max_txg))
 261  265                          return (0);
 262      -        } else if (bp->blk_birth <= td->td_min_txg) {
      266 +        } else if (bp->blk_birth <= td->td_min_txg ||
      267 +            bp->blk_birth >= td->td_max_txg) {
 263  268                  return (0);
 264  269          }
 265  270  
 266  271          if (pd != NULL && !pd->pd_exited && prefetch_needed(pd, bp)) {
 267  272                  uint64_t size = BP_GET_LSIZE(bp);
 268  273                  mutex_enter(&pd->pd_mtx);
 269  274                  ASSERT(pd->pd_bytes_fetched >= 0);
 270  275                  while (pd->pd_bytes_fetched < size && !pd->pd_exited)
 271  276                          cv_wait(&pd->pd_cv, &pd->pd_mtx);
 272  277                  pd->pd_bytes_fetched -= size;
↓ open down ↓ 6 lines elided ↑ open up ↑
 279  284                  if (err != 0)
 280  285                          goto post;
 281  286                  return (0);
 282  287          }
 283  288  
 284  289          if (td->td_flags & TRAVERSE_PRE) {
 285  290                  err = td->td_func(td->td_spa, NULL, bp, zb, dnp,
 286  291                      td->td_arg);
 287  292                  if (err == TRAVERSE_VISIT_NO_CHILDREN)
 288  293                          return (0);
      294 +                /* handle pausing at a common point */
      295 +                if (err == ERESTART)
      296 +                        td->td_paused = B_TRUE;
 289  297                  if (err != 0)
 290  298                          goto post;
 291  299          }
 292  300  
 293  301          if (BP_GET_LEVEL(bp) > 0) {
 294  302                  arc_flags_t flags = ARC_FLAG_WAIT;
 295  303                  int i;
 296  304                  blkptr_t *cbp;
 297  305                  int epb = BP_GET_LSIZE(bp) >> SPA_BLKPTRSHIFT;
 298  306  
↓ open down ↓ 114 lines elided ↑ open up ↑
 413  421                   * to dereference it.
 414  422                   */
 415  423                  td->td_resume->zb_blkid = zb->zb_blkid;
 416  424                  if (zb->zb_level > 0) {
 417  425                          td->td_resume->zb_blkid <<= zb->zb_level *
 418  426                              (dnp->dn_indblkshift - SPA_BLKPTRSHIFT);
 419  427                  }
 420  428                  td->td_paused = B_TRUE;
 421  429          }
 422  430  
      431 +        /* if we walked over all bp bookmark must be cleared */
      432 +        if (!err && !td->td_paused && td->td_resume != NULL &&
      433 +            bp == td->td_rootbp && td->td_pfd != NULL) {
      434 +                bzero(td->td_resume, sizeof (*td->td_resume));
      435 +        }
      436 +
 423  437          return (err);
 424  438  }
 425  439  
 426  440  static void
 427  441  prefetch_dnode_metadata(traverse_data_t *td, const dnode_phys_t *dnp,
 428  442      uint64_t objset, uint64_t object)
 429  443  {
 430  444          int j;
 431  445          zbookmark_phys_t czb;
 432  446  
↓ open down ↓ 106 lines elided ↑ open up ↑
 539  553          cv_broadcast(&td_main->td_pfd->pd_cv);
 540  554          mutex_exit(&td_main->td_pfd->pd_mtx);
 541  555  }
 542  556  
 543  557  /*
 544  558   * NB: dataset must not be changing on-disk (eg, is a snapshot or we are
 545  559   * in syncing context).
 546  560   */
 547  561  static int
 548  562  traverse_impl(spa_t *spa, dsl_dataset_t *ds, uint64_t objset, blkptr_t *rootbp,
 549      -    uint64_t txg_start, zbookmark_phys_t *resume, int flags,
 550      -    blkptr_cb_t func, void *arg)
      563 +    uint64_t txg_start, uint64_t txg_finish, zbookmark_phys_t *resume,
      564 +    int flags, blkptr_cb_t func, void *arg)
 551  565  {
 552  566          traverse_data_t td;
 553  567          prefetch_data_t pd = { 0 };
 554  568          zbookmark_phys_t czb;
 555  569          int err;
 556  570  
 557  571          ASSERT(ds == NULL || objset == ds->ds_object);
 558  572          ASSERT(!(flags & TRAVERSE_PRE) || !(flags & TRAVERSE_POST));
 559  573  
 560  574          td.td_spa = spa;
 561  575          td.td_objset = objset;
 562  576          td.td_rootbp = rootbp;
 563  577          td.td_min_txg = txg_start;
      578 +        td.td_max_txg = txg_finish;
 564  579          td.td_resume = resume;
 565  580          td.td_func = func;
 566  581          td.td_arg = arg;
 567  582          td.td_pfd = &pd;
 568  583          td.td_flags = flags;
 569  584          td.td_paused = B_FALSE;
 570  585          td.td_realloc_possible = (txg_start == 0 ? B_FALSE : B_TRUE);
 571  586  
 572  587          if (spa_feature_is_active(spa, SPA_FEATURE_HOLE_BIRTH)) {
 573  588                  VERIFY(spa_feature_enabled_txg(spa,
↓ open down ↓ 50 lines elided ↑ open up ↑
 624  639  /*
 625  640   * NB: dataset must not be changing on-disk (eg, is a snapshot or we are
 626  641   * in syncing context).
 627  642   */
 628  643  int
 629  644  traverse_dataset_resume(dsl_dataset_t *ds, uint64_t txg_start,
 630  645      zbookmark_phys_t *resume,
 631  646      int flags, blkptr_cb_t func, void *arg)
 632  647  {
 633  648          return (traverse_impl(ds->ds_dir->dd_pool->dp_spa, ds, ds->ds_object,
 634      -            &dsl_dataset_phys(ds)->ds_bp, txg_start, resume, flags, func, arg));
      649 +            &dsl_dataset_phys(ds)->ds_bp, txg_start, UINT64_MAX, resume, flags,
      650 +            func, arg));
 635  651  }
 636  652  
 637  653  int
 638  654  traverse_dataset(dsl_dataset_t *ds, uint64_t txg_start,
 639  655      int flags, blkptr_cb_t func, void *arg)
 640  656  {
 641  657          return (traverse_dataset_resume(ds, txg_start, NULL, flags, func, arg));
 642  658  }
 643  659  
 644  660  int
 645  661  traverse_dataset_destroyed(spa_t *spa, blkptr_t *blkptr,
 646  662      uint64_t txg_start, zbookmark_phys_t *resume, int flags,
 647  663      blkptr_cb_t func, void *arg)
 648  664  {
 649  665          return (traverse_impl(spa, NULL, ZB_DESTROYED_OBJSET,
 650      -            blkptr, txg_start, resume, flags, func, arg));
      666 +            blkptr, txg_start, UINT64_MAX, resume, flags, func, arg));
 651  667  }
 652  668  
 653  669  /*
 654  670   * NB: pool must not be changing on-disk (eg, from zdb or sync context).
 655  671   */
 656  672  int
 657      -traverse_pool(spa_t *spa, uint64_t txg_start, int flags,
 658      -    blkptr_cb_t func, void *arg)
      673 +traverse_pool(spa_t *spa, uint64_t txg_start, uint64_t txg_finish, int flags,
      674 +    blkptr_cb_t func, void *arg, zbookmark_phys_t *zb)
 659  675  {
 660      -        int err;
      676 +        int err = 0, lasterr = 0;
 661  677          dsl_pool_t *dp = spa_get_dsl(spa);
 662  678          objset_t *mos = dp->dp_meta_objset;
 663  679          boolean_t hard = (flags & TRAVERSE_HARD);
 664  680  
 665  681          /* visit the MOS */
 666      -        err = traverse_impl(spa, NULL, 0, spa_get_rootblkptr(spa),
 667      -            txg_start, NULL, flags, func, arg);
 668      -        if (err != 0)
 669      -                return (err);
      682 +        if (!zb || (zb->zb_objset == 0 && zb->zb_object == 0)) {
      683 +                err = traverse_impl(spa, NULL, 0, spa_get_rootblkptr(spa),
      684 +                    txg_start, txg_finish, NULL, flags, func, arg);
      685 +                if (err != 0)
      686 +                        return (err);
      687 +        }
 670  688  
 671  689          /* visit each dataset */
 672      -        for (uint64_t obj = 1; err == 0;
      690 +        for (uint64_t obj = (zb && !ZB_IS_ZERO(zb))? zb->zb_objset : 1;
      691 +            err == 0 || (err != ESRCH && hard);
 673  692              err = dmu_object_next(mos, &obj, B_FALSE, txg_start)) {
 674  693                  dmu_object_info_t doi;
 675  694  
 676  695                  err = dmu_object_info(mos, obj, &doi);
 677  696                  if (err != 0) {
 678  697                          if (hard)
 679  698                                  continue;
 680  699                          break;
 681  700                  }
 682  701  
 683  702                  if (doi.doi_bonus_type == DMU_OT_DSL_DATASET) {
 684  703                          dsl_dataset_t *ds;
      704 +                        objset_t *os;
      705 +                        boolean_t os_is_snapshot = B_FALSE;
 685  706                          uint64_t txg = txg_start;
      707 +                        uint64_t ctxg;
      708 +                        uint64_t max_txg = txg_finish;
 686  709  
 687  710                          dsl_pool_config_enter(dp, FTAG);
 688  711                          err = dsl_dataset_hold_obj(dp, obj, FTAG, &ds);
 689  712                          dsl_pool_config_exit(dp, FTAG);
 690  713                          if (err != 0) {
 691  714                                  if (hard)
 692  715                                          continue;
 693  716                                  break;
 694  717                          }
 695      -                        if (dsl_dataset_phys(ds)->ds_prev_snap_txg > txg)
      718 +
      719 +                        dsl_pool_config_enter(dp, FTAG);
      720 +                        err = dmu_objset_from_ds(ds, &os);
      721 +                        if (err == 0)
      722 +                                os_is_snapshot = dmu_objset_is_snapshot(os);
      723 +
      724 +                        dsl_pool_config_exit(dp, FTAG);
      725 +                        if (err != 0) {
      726 +                                dsl_dataset_rele(ds, FTAG);
      727 +                                if (hard)
      728 +                                        continue;
      729 +                                break;
      730 +                        }
      731 +                        ctxg = dsl_dataset_phys(ds)->ds_creation_txg;
      732 +
      733 +                        /* uplimited traverse walks over shapshots only */
      734 +                        if (max_txg != UINT64_MAX && !os_is_snapshot) {
      735 +                                dsl_dataset_rele(ds, FTAG);
      736 +                                continue;
      737 +                        }
      738 +                        if (max_txg != UINT64_MAX && ctxg >= max_txg) {
      739 +                                dsl_dataset_rele(ds, FTAG);
      740 +                                continue;
      741 +                        }
      742 +                        if (os_is_snapshot && ctxg <= txg_start) {
      743 +                                dsl_dataset_rele(ds, FTAG);
      744 +                                continue;
      745 +                        }
      746 +                        if (max_txg == UINT64_MAX &&
      747 +                            dsl_dataset_phys(ds)->ds_prev_snap_txg > txg)
 696  748                                  txg = dsl_dataset_phys(ds)->ds_prev_snap_txg;
 697      -                        err = traverse_dataset(ds, txg, flags, func, arg);
      749 +                        if (txg > max_txg)
      750 +                                max_txg = txg;
      751 +                        err = traverse_impl(spa, ds, ds->ds_object,
      752 +                            &dsl_dataset_phys(ds)->ds_bp,
      753 +                            txg, max_txg, zb, flags, func, arg);
 698  754                          dsl_dataset_rele(ds, FTAG);
 699      -                        if (err != 0)
      755 +                        if (err != 0) {
      756 +                                if (!hard)
      757 +                                        return (err);
      758 +                                lasterr = err;
      759 +                        }
      760 +                        if (zb && !ZB_IS_ZERO(zb))
 700  761                                  break;
 701  762                  }
 702  763          }
 703      -        if (err == ESRCH)
      764 +        if (err == ESRCH) {
      765 +                /* zero bookmark means we are done */
      766 +                if (zb)
      767 +                        bzero(zb, sizeof (*zb));
 704  768                  err = 0;
 705      -        return (err);
      769 +        }
      770 +        return (err != 0 ? err : lasterr);
 706  771  }
    
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX