Print this page
NEX-15281 zfs_panic_recover() during hpr disable/enable
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
NEX-15281 zfs_panic_recover() during hpr disable/enable
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
NEX-13629 zfs send -s: assertion failed: err != 0 || (dsp->dsa_sent_begin && dsp->dsa_sent_end), file: ../../common/fs/zfs/dmu_send.c, line: 1010
Reviewed by: Alex Deiter <alex.deiter@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
NEX-9752 backport illumos 6950 ARC should cache compressed data
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
6950 ARC should cache compressed data
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Don Brady <don.brady@intel.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
NEX-9575 zfs send -s panics
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Revert "NEX-7251 Resume_token is not cleared right after finishing receive"
This reverts commit 9e97a45e8cf6ca59307a39e2d3c11c6e845e4187.
NEX-7251 Resume_token is not cleared right after finishing receive
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Alexey Komarov <alexey.komarov@nexenta.com>
NEX-5928 KRRP: Integrate illumos/openzfs resume-token, to resume replication from a given synced offset
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Alexey Komarov <alexey.komarov@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
NEX-5795 Rename 'wrc' as 'wbc' in the source and in the tech docs
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-5272 KRRP: replicate snapshot properties
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Alexey Komarov <alexey.komarov@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
NEX-5270 WBC: Incorrect error message when trying to 'zfs recv' into wrcached dataset
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
NEX-5132 WBC: Do not allow recv to datasets with enabled writecache
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
6358 A faulted pool with only unavailable vdevs triggers assertion failure in libzfs
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Reviewed by: Serban Maduta <serban.maduta@gmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>
6393 zfs receive a full send as a clone
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Approved by: Dan McDonald <danmcd@omniti.com>
2605 want to resume interrupted zfs send
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed by: Xin Li <delphij@freebsd.org>
Reviewed by: Arne Jansen <sensille@gmx.net>
Approved by: Dan McDonald <danmcd@omniti.com>
4185 add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R (fix studio build)
4185 add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Approved by: Garrett D'Amore <garrett@damore.org>
6047 SPARC boot should support feature@embedded_data
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>
5959 clean up per-dataset feature count code
Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
NEX-4582 update wrc test cases for allow to use write back cache per tree of datasets
Reviewed by: Steve Peng <steve.peng@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
5960 zfs recv should prefetch indirect blocks
5925 zfs receive -o origin=
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
5946 zfs_ioc_space_snaps must check that firstsnap and lastsnap refer to snapshots
5945 zfs_ioc_send_space must ensure that fromsnap refers to a snapshot
Reviewed by: Steven Hartland <killing@multiplay.co.uk>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Gordon Ross <gordon.ross@nexenta.com>
5870 dmu_recv_end_check() leaks origin_head hold if error happens in drc_force branch
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>
5912 full stream can not be force-received into a dataset if it has a snapshot
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
5809 Blowaway full receive in v1 pool causes kernel panic
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: Will Andrews <will@freebsd.org>
Approved by: Gordon Ross <gwr@nexenta.com>
5746 more checksumming in zfs send
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com>
Approved by: Albert Lee <trisk@omniti.com>
5765 add support for estimating send stream size with lzc_send_space when source is a bookmark
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Steven Hartland <killing@multiplay.co.uk>
Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com>
Approved by: Albert Lee <trisk@nexenta.com>
5769 Cast 'zfs bad bloc' to ULL for x86
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com>
Reviewed by: Richard PALO <richard@NetBSD.org>
Approved by: Dan McDonald <danmcd@omniti.com>
NEX-4476 WRC: Allow to use write back cache per tree of datasets
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
Revert "NEX-4476 WRC: Allow to use write back cache per tree of datasets"
This reverts commit fe97b74444278a6f36fec93179133641296312da.
NEX-4476 WRC: Allow to use write back cache per tree of datasets
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
NEX-3588 krrp panics in zfs:dmu_recv_end_check+13b () when running zfs tests.
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Kevin Crowe <kevin.crowe@nexenta.com>
NEX-3558 KRRP Integration
4370 avoid transmitting holes during zfs send
4371 DMU code clean up
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net>
Approved by: Garrett D'Amore <garrett@damore.org>
Fixup merge results
re #12619 rb4429 More dp->dp_config_rwlock holds
Bug 10481 - Dry run option in 'zfs send' isn't the same as in NexentaStor 3.1

Split Close
Expand all
Collapse all
          --- old/usr/src/uts/common/fs/zfs/dmu_send.c
          +++ new/usr/src/uts/common/fs/zfs/dmu_send.c
↓ open down ↓ 12 lines elided ↑ open up ↑
  13   13   * When distributing Covered Code, include this CDDL HEADER in each
  14   14   * file and include the License file at usr/src/OPENSOLARIS.LICENSE.
  15   15   * If applicable, add the following below this CDDL HEADER, with the
  16   16   * fields enclosed by brackets "[]" replaced with your own identifying
  17   17   * information: Portions Copyright [yyyy] [name of copyright owner]
  18   18   *
  19   19   * CDDL HEADER END
  20   20   */
  21   21  /*
  22   22   * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
  23      - * Copyright 2011 Nexenta Systems, Inc. All rights reserved.
  24   23   * Copyright (c) 2011, 2015 by Delphix. All rights reserved.
  25   24   * Copyright (c) 2014, Joyent, Inc. All rights reserved.
  26   25   * Copyright 2014 HybridCluster. All rights reserved.
       26 + * Copyright 2017 Nexenta Systems, Inc. All rights reserved.
  27   27   * Copyright 2016 RackTop Systems.
  28   28   * Copyright (c) 2014 Integros [integros.com]
  29   29   */
  30   30  
  31   31  #include <sys/dmu.h>
  32   32  #include <sys/dmu_impl.h>
  33   33  #include <sys/dmu_tx.h>
  34   34  #include <sys/dbuf.h>
  35   35  #include <sys/dnode.h>
  36   36  #include <sys/zfs_context.h>
↓ open down ↓ 10 lines elided ↑ open up ↑
  47   47  #include <sys/zfs_znode.h>
  48   48  #include <zfs_fletcher.h>
  49   49  #include <sys/avl.h>
  50   50  #include <sys/ddt.h>
  51   51  #include <sys/zfs_onexit.h>
  52   52  #include <sys/dmu_send.h>
  53   53  #include <sys/dsl_destroy.h>
  54   54  #include <sys/blkptr.h>
  55   55  #include <sys/dsl_bookmark.h>
  56   56  #include <sys/zfeature.h>
       57 +#include <sys/autosnap.h>
  57   58  #include <sys/bqueue.h>
  58   59  
       60 +#include "zfs_errno.h"
       61 +
  59   62  /* Set this tunable to TRUE to replace corrupt data with 0x2f5baddb10c */
  60   63  int zfs_send_corrupt_data = B_FALSE;
  61   64  int zfs_send_queue_length = 16 * 1024 * 1024;
  62   65  int zfs_recv_queue_length = 16 * 1024 * 1024;
  63   66  /* Set this tunable to FALSE to disable setting of DRR_FLAG_FREERECORDS */
  64   67  int zfs_send_set_freerecords_bit = B_TRUE;
  65   68  
  66   69  static char *dmu_recv_tag = "dmu_recv_tag";
  67   70  const char *recv_clone_name = "%recv";
  68   71  
↓ open down ↓ 34 lines elided ↑ open up ↑
 103  106           * receive_read().  Keeping this assertion ensures that we do not
 104  107           * inadvertently break backwards compatibility (causing the assertion
 105  108           * in receive_read() to trigger on old software).
 106  109           *
 107  110           * Removing the assertions could be rolled into a new feature that uses
 108  111           * data that isn't 8-byte aligned; if the assertions were removed, a
 109  112           * feature flag would have to be added.
 110  113           */
 111  114  
 112  115          ASSERT0(len % 8);
      116 +        ASSERT(buf != NULL);
 113  117  
 114      -        dsp->dsa_err = vn_rdwr(UIO_WRITE, dsp->dsa_vp,
 115      -            (caddr_t)buf, len,
 116      -            0, UIO_SYSSPACE, FAPPEND, RLIM64_INFINITY, CRED(), &resid);
 117      -
      118 +        dsp->dsa_err = 0;
      119 +        if (!dsp->sendsize) {
      120 +                /* if vp is NULL, then the send is from krrp */
      121 +                if (dsp->dsa_vp != NULL) {
      122 +                        dsp->dsa_err = vn_rdwr(UIO_WRITE, dsp->dsa_vp,
      123 +                            (caddr_t)buf, len,
      124 +                            0, UIO_SYSSPACE, FAPPEND, RLIM64_INFINITY,
      125 +                            CRED(), &resid);
      126 +                } else {
      127 +                        ASSERT(dsp->dsa_krrp_task != NULL);
      128 +                        dsp->dsa_err = dmu_krrp_buffer_write(buf, len,
      129 +                            dsp->dsa_krrp_task);
      130 +                }
      131 +        }
 118  132          mutex_enter(&ds->ds_sendstream_lock);
 119  133          *dsp->dsa_off += len;
 120  134          mutex_exit(&ds->ds_sendstream_lock);
 121  135  
 122  136          return (dsp->dsa_err);
 123  137  }
 124  138  
      139 +static int
      140 +dump_bytes_with_checksum(dmu_sendarg_t *dsp, void *buf, int len)
      141 +{
      142 +        if (!dsp->sendsize && (dsp->dsa_krrp_task == NULL ||
      143 +            dsp->dsa_krrp_task->buffer_args.force_cksum)) {
      144 +                (void) fletcher_4_incremental_native(buf, len, &dsp->dsa_zc);
      145 +        }
      146 +
      147 +        return (dump_bytes(dsp, buf, len));
      148 +}
      149 +
 125  150  /*
 126  151   * For all record types except BEGIN, fill in the checksum (overlaid in
 127  152   * drr_u.drr_checksum.drr_checksum).  The checksum verifies everything
 128  153   * up to the start of the checksum itself.
 129  154   */
 130  155  static int
 131  156  dump_record(dmu_sendarg_t *dsp, void *payload, int payload_len)
 132  157  {
      158 +        boolean_t do_checksum = (dsp->dsa_krrp_task == NULL ||
      159 +            dsp->dsa_krrp_task->buffer_args.force_cksum);
      160 +
 133  161          ASSERT3U(offsetof(dmu_replay_record_t, drr_u.drr_checksum.drr_checksum),
 134  162              ==, sizeof (dmu_replay_record_t) - sizeof (zio_cksum_t));
 135      -        (void) fletcher_4_incremental_native(dsp->dsa_drr,
 136      -            offsetof(dmu_replay_record_t, drr_u.drr_checksum.drr_checksum),
 137      -            &dsp->dsa_zc);
      163 +
 138  164          if (dsp->dsa_drr->drr_type == DRR_BEGIN) {
 139  165                  dsp->dsa_sent_begin = B_TRUE;
 140      -        } else {
 141      -                ASSERT(ZIO_CHECKSUM_IS_ZERO(&dsp->dsa_drr->drr_u.
 142      -                    drr_checksum.drr_checksum));
 143      -                dsp->dsa_drr->drr_u.drr_checksum.drr_checksum = dsp->dsa_zc;
 144  166          }
      167 +
 145  168          if (dsp->dsa_drr->drr_type == DRR_END) {
 146  169                  dsp->dsa_sent_end = B_TRUE;
 147  170          }
 148      -        (void) fletcher_4_incremental_native(&dsp->dsa_drr->
 149      -            drr_u.drr_checksum.drr_checksum,
 150      -            sizeof (zio_cksum_t), &dsp->dsa_zc);
      171 +
      172 +        if (!dsp->sendsize && do_checksum) {
      173 +                (void) fletcher_4_incremental_native(dsp->dsa_drr,
      174 +                    offsetof(dmu_replay_record_t,
      175 +                    drr_u.drr_checksum.drr_checksum),
      176 +                    &dsp->dsa_zc);
      177 +                if (dsp->dsa_drr->drr_type != DRR_BEGIN) {
      178 +                        ASSERT(ZIO_CHECKSUM_IS_ZERO(&dsp->dsa_drr->drr_u.
      179 +                            drr_checksum.drr_checksum));
      180 +                        dsp->dsa_drr->drr_u.drr_checksum.drr_checksum =
      181 +                            dsp->dsa_zc;
      182 +                }
      183 +
      184 +                (void) fletcher_4_incremental_native(&dsp->dsa_drr->
      185 +                    drr_u.drr_checksum.drr_checksum,
      186 +                    sizeof (zio_cksum_t), &dsp->dsa_zc);
      187 +        }
      188 +
 151  189          if (dump_bytes(dsp, dsp->dsa_drr, sizeof (dmu_replay_record_t)) != 0)
 152  190                  return (SET_ERROR(EINTR));
 153  191          if (payload_len != 0) {
 154      -                (void) fletcher_4_incremental_native(payload, payload_len,
 155      -                    &dsp->dsa_zc);
 156      -                if (dump_bytes(dsp, payload, payload_len) != 0)
      192 +                if (dump_bytes_with_checksum(dsp, payload, payload_len) != 0)
 157  193                          return (SET_ERROR(EINTR));
 158  194          }
 159  195          return (0);
 160  196  }
 161  197  
 162  198  /*
 163  199   * Fill in the drr_free struct, or perform aggregation if the previous record is
 164  200   * also a free record, and the two are adjacent.
 165  201   *
 166  202   * Note that we send free records even for a full send, because we want to be
↓ open down ↓ 187 lines elided ↑ open up ↑
 354  390          drrw->drr_psize = BPE_GET_PSIZE(bp);
 355  391  
 356  392          decode_embedded_bp_compressed(bp, buf);
 357  393  
 358  394          if (dump_record(dsp, buf, P2ROUNDUP(drrw->drr_psize, 8)) != 0)
 359  395                  return (EINTR);
 360  396          return (0);
 361  397  }
 362  398  
 363  399  static int
 364      -dump_spill(dmu_sendarg_t *dsp, uint64_t object, int blksz, void *data)
      400 +dump_spill(dmu_sendarg_t *dsp, uint64_t object,
      401 +    const blkptr_t *bp, const zbookmark_phys_t *zb)
 365  402  {
      403 +        int rc = 0;
 366  404          struct drr_spill *drrs = &(dsp->dsa_drr->drr_u.drr_spill);
      405 +        enum arc_flags aflags = ARC_FLAG_WAIT;
      406 +        int blksz = BP_GET_LSIZE(bp);
      407 +        arc_buf_t *abuf;
 367  408  
 368  409          if (dsp->dsa_pending_op != PENDING_NONE) {
 369  410                  if (dump_record(dsp, NULL, 0) != 0)
 370  411                          return (SET_ERROR(EINTR));
 371  412                  dsp->dsa_pending_op = PENDING_NONE;
 372  413          }
 373  414  
 374  415          /* write a SPILL record */
 375  416          bzero(dsp->dsa_drr, sizeof (dmu_replay_record_t));
 376  417          dsp->dsa_drr->drr_type = DRR_SPILL;
 377  418          drrs->drr_object = object;
 378  419          drrs->drr_length = blksz;
 379  420          drrs->drr_toguid = dsp->dsa_toguid;
 380  421  
 381      -        if (dump_record(dsp, data, blksz) != 0)
      422 +        if (dump_record(dsp, NULL, 0))
 382  423                  return (SET_ERROR(EINTR));
      424 +
      425 +        /*
      426 +         * if dsa_krrp task is not NULL, then the send is from krrp and we can
      427 +         * try to bypass copying data to an intermediate buffer.
      428 +         */
      429 +        if (!dsp->sendsize && dsp->dsa_krrp_task != NULL) {
      430 +                rc = dmu_krrp_direct_arc_read(dsp->dsa_os->os_spa,
      431 +                    dsp->dsa_krrp_task, &dsp->dsa_zc, bp);
      432 +                /*
      433 +                 * rc == 0 means that we successfully copy
      434 +                 * the data directly from ARC to krrp buffer
      435 +                 * rc != 0 && rc != EINTR means that we cannot
      436 +                 * zerocopy the data and need to use slow-path
      437 +                 */
      438 +                if (rc == 0 || rc == EINTR)
      439 +                        return (rc);
      440 +
      441 +                ASSERT3U(rc, ==, ENODATA);
      442 +        }
      443 +
      444 +        if (arc_read(NULL, dsp->dsa_os->os_spa, bp, arc_getbuf_func, &abuf,
      445 +            ZIO_PRIORITY_ASYNC_READ, ZIO_FLAG_CANFAIL,
      446 +            &aflags, zb) != 0)
      447 +                return (SET_ERROR(EIO));
      448 +
      449 +        rc = dump_bytes_with_checksum(dsp, abuf->b_data, blksz);
      450 +        arc_buf_destroy(abuf, &abuf);
      451 +        if (rc != 0)
      452 +                return (SET_ERROR(EINTR));
      453 +
 383  454          return (0);
 384  455  }
 385  456  
 386  457  static int
 387  458  dump_freeobjects(dmu_sendarg_t *dsp, uint64_t firstobj, uint64_t numobjs)
 388  459  {
 389  460          struct drr_freeobjects *drrfo = &(dsp->dsa_drr->drr_u.drr_freeobjects);
 390  461  
 391  462          /*
 392  463           * If there is a pending op, but it's not PENDING_FREEOBJECTS,
↓ open down ↓ 236 lines elided ↑ open up ↑
 629  700  
 630  701                  dnode_phys_t *blk = abuf->b_data;
 631  702                  uint64_t dnobj = zb->zb_blkid * (blksz >> DNODE_SHIFT);
 632  703                  for (int i = 0; i < blksz >> DNODE_SHIFT; i++) {
 633  704                          err = dump_dnode(dsa, dnobj + i, blk + i);
 634  705                          if (err != 0)
 635  706                                  break;
 636  707                  }
 637  708                  arc_buf_destroy(abuf, &abuf);
 638  709          } else if (type == DMU_OT_SA) {
 639      -                arc_flags_t aflags = ARC_FLAG_WAIT;
 640      -                arc_buf_t *abuf;
 641      -                int blksz = BP_GET_LSIZE(bp);
 642      -
 643      -                if (arc_read(NULL, spa, bp, arc_getbuf_func, &abuf,
 644      -                    ZIO_PRIORITY_ASYNC_READ, ZIO_FLAG_CANFAIL,
 645      -                    &aflags, zb) != 0)
 646      -                        return (SET_ERROR(EIO));
 647      -
 648      -                err = dump_spill(dsa, zb->zb_object, blksz, abuf->b_data);
 649      -                arc_buf_destroy(abuf, &abuf);
      710 +                /*
      711 +                 * The upstream code has arc_read() call here, but we moved
      712 +                 * it to dump_spill() since we want to take advantage of
      713 +                 * zero copy of the buffer if possible
      714 +                 */
      715 +                err = dump_spill(dsa, zb->zb_object, bp, zb);
 650  716          } else if (backup_do_embed(dsa, bp)) {
 651  717                  /* it's an embedded level-0 block of a regular object */
 652  718                  int blksz = dblkszsec << SPA_MINBLOCKSHIFT;
 653  719                  ASSERT0(zb->zb_level);
 654  720                  err = dump_write_embedded(dsa, zb->zb_object,
 655  721                      zb->zb_blkid * blksz, blksz, bp);
 656  722          } else {
 657  723                  /* it's a level-0 block of a regular object */
 658  724                  arc_flags_t aflags = ARC_FLAG_WAIT;
 659  725                  arc_buf_t *abuf;
↓ open down ↓ 20 lines elided ↑ open up ↑
 680  746                  boolean_t request_compressed =
 681  747                      (dsa->dsa_featureflags & DMU_BACKUP_FEATURE_COMPRESSED) &&
 682  748                      !split_large_blocks && !BP_SHOULD_BYTESWAP(bp) &&
 683  749                      !BP_IS_EMBEDDED(bp) && !DMU_OT_IS_METADATA(BP_GET_TYPE(bp));
 684  750  
 685  751                  ASSERT0(zb->zb_level);
 686  752                  ASSERT(zb->zb_object > dsa->dsa_resume_object ||
 687  753                      (zb->zb_object == dsa->dsa_resume_object &&
 688  754                      zb->zb_blkid * blksz >= dsa->dsa_resume_offset));
 689  755  
 690      -                ASSERT0(zb->zb_level);
 691      -                ASSERT(zb->zb_object > dsa->dsa_resume_object ||
 692      -                    (zb->zb_object == dsa->dsa_resume_object &&
 693      -                    zb->zb_blkid * blksz >= dsa->dsa_resume_offset));
 694      -
 695  756                  ASSERT3U(blksz, ==, BP_GET_LSIZE(bp));
 696  757  
 697  758                  enum zio_flag zioflags = ZIO_FLAG_CANFAIL;
 698  759                  if (request_compressed)
 699  760                          zioflags |= ZIO_FLAG_RAW;
 700  761                  if (arc_read(NULL, spa, bp, arc_getbuf_func, &abuf,
 701  762                      ZIO_PRIORITY_ASYNC_READ, zioflags, &aflags, zb) != 0) {
 702  763                          if (zfs_send_corrupt_data) {
 703  764                                  /* Send a block filled with 0x"zfs badd bloc" */
 704  765                                  abuf = arc_alloc_buf(spa, &abuf, ARC_BUFC_DATA,
↓ open down ↓ 12 lines elided ↑ open up ↑
 717  778  
 718  779                  if (split_large_blocks) {
 719  780                          ASSERT3U(arc_get_compression(abuf), ==,
 720  781                              ZIO_COMPRESS_OFF);
 721  782                          char *buf = abuf->b_data;
 722  783                          while (blksz > 0 && err == 0) {
 723  784                                  int n = MIN(blksz, SPA_OLD_MAXBLOCKSIZE);
 724  785                                  err = dump_write(dsa, type, zb->zb_object,
 725  786                                      offset, n, n, NULL, buf);
 726  787                                  offset += n;
 727      -                                buf += n;
 728  788                                  blksz -= n;
 729  789                          }
 730  790                  } else {
 731  791                          err = dump_write(dsa, type, zb->zb_object, offset,
 732  792                              blksz, arc_buf_size(abuf), bp, abuf->b_data);
 733  793                  }
 734  794                  arc_buf_destroy(abuf, &abuf);
 735  795          }
 736  796  
 737  797          ASSERT(err == 0 || err == EINTR);
↓ open down ↓ 10 lines elided ↑ open up ↑
 748  808          kmem_free(data, sizeof (*data));
 749  809          return (tmp);
 750  810  }
 751  811  
 752  812  /*
 753  813   * Actually do the bulk of the work in a zfs send.
 754  814   *
 755  815   * Note: Releases dp using the specified tag.
 756  816   */
 757  817  static int
 758      -dmu_send_impl(void *tag, dsl_pool_t *dp, dsl_dataset_t *to_ds,
      818 +dmu_send_impl_ss(void *tag, dsl_pool_t *dp, dsl_dataset_t *to_ds,
 759  819      zfs_bookmark_phys_t *ancestor_zb, boolean_t is_clone,
 760  820      boolean_t embedok, boolean_t large_block_ok, boolean_t compressok,
 761      -    int outfd, uint64_t resumeobj, uint64_t resumeoff,
 762      -    vnode_t *vp, offset_t *off)
      821 +    int outfd, uint64_t resumeobj, uint64_t resumeoff, vnode_t *vp,
      822 +    offset_t *off, boolean_t sendsize, dmu_krrp_task_t *krrp_task)
 763  823  {
 764  824          objset_t *os;
 765  825          dmu_replay_record_t *drr;
 766  826          dmu_sendarg_t *dsp;
 767  827          int err;
 768  828          uint64_t fromtxg = 0;
 769  829          uint64_t featureflags = 0;
 770  830          struct send_thread_arg to_arg = { 0 };
 771  831  
 772  832          err = dmu_objset_from_ds(to_ds, &os);
↓ open down ↓ 70 lines elided ↑ open up ↑
 843  903  
 844  904          dsp = kmem_zalloc(sizeof (dmu_sendarg_t), KM_SLEEP);
 845  905  
 846  906          dsp->dsa_drr = drr;
 847  907          dsp->dsa_vp = vp;
 848  908          dsp->dsa_outfd = outfd;
 849  909          dsp->dsa_proc = curproc;
 850  910          dsp->dsa_os = os;
 851  911          dsp->dsa_off = off;
 852  912          dsp->dsa_toguid = dsl_dataset_phys(to_ds)->ds_guid;
      913 +        dsp->dsa_krrp_task = krrp_task;
 853  914          dsp->dsa_pending_op = PENDING_NONE;
 854  915          dsp->dsa_featureflags = featureflags;
      916 +        dsp->sendsize = sendsize;
 855  917          dsp->dsa_resume_object = resumeobj;
 856  918          dsp->dsa_resume_offset = resumeoff;
 857  919  
 858  920          mutex_enter(&to_ds->ds_sendstream_lock);
 859  921          list_insert_head(&to_ds->ds_sendstreams, dsp);
 860  922          mutex_exit(&to_ds->ds_sendstream_lock);
 861  923  
 862  924          dsl_dataset_long_hold(to_ds, FTAG);
 863  925          dsl_pool_rele(dp, tag);
 864  926  
↓ open down ↓ 31 lines elided ↑ open up ↑
 896  958          to_arg.flags = TRAVERSE_PRE | TRAVERSE_PREFETCH;
 897  959          (void) thread_create(NULL, 0, send_traverse_thread, &to_arg, 0, curproc,
 898  960              TS_RUN, minclsyspri);
 899  961  
 900  962          struct send_block_record *to_data;
 901  963          to_data = bqueue_dequeue(&to_arg.q);
 902  964  
 903  965          while (!to_data->eos_marker && err == 0) {
 904  966                  err = do_dump(dsp, to_data);
 905  967                  to_data = get_next_record(&to_arg.q, to_data);
 906      -                if (issig(JUSTLOOKING) && issig(FORREAL))
      968 +                if (vp != NULL && issig(JUSTLOOKING) && issig(FORREAL))
 907  969                          err = EINTR;
 908  970          }
 909  971  
 910  972          if (err != 0) {
 911  973                  to_arg.cancel = B_TRUE;
 912  974                  while (!to_data->eos_marker) {
 913  975                          to_data = get_next_record(&to_arg.q, to_data);
 914  976                  }
 915  977          }
 916  978          kmem_free(to_data, sizeof (*to_data));
↓ open down ↓ 33 lines elided ↑ open up ↑
 950 1012  
 951 1013          kmem_free(drr, sizeof (dmu_replay_record_t));
 952 1014          kmem_free(dsp, sizeof (dmu_sendarg_t));
 953 1015  
 954 1016          dsl_dataset_long_rele(to_ds, FTAG);
 955 1017  
 956 1018          return (err);
 957 1019  }
 958 1020  
 959 1021  int
     1022 +dmu_send_impl(void *tag, dsl_pool_t *dp, dsl_dataset_t *to_ds,
     1023 +    zfs_bookmark_phys_t *ancestor_zb, boolean_t is_clone, boolean_t embedok,
     1024 +    boolean_t large_block_ok, boolean_t compressok, int outfd,
     1025 +    uint64_t resumeobj, uint64_t resumeoff, vnode_t *vp, offset_t *off,
     1026 +    dmu_krrp_task_t *krrp_task)
     1027 +{
     1028 +        return (dmu_send_impl_ss(tag, dp, to_ds, ancestor_zb, is_clone,
     1029 +            embedok, large_block_ok, compressok, outfd, resumeobj, resumeoff,
     1030 +            vp, off, B_FALSE, krrp_task));
     1031 +}
     1032 +
     1033 +int
 960 1034  dmu_send_obj(const char *pool, uint64_t tosnap, uint64_t fromsnap,
 961 1035      boolean_t embedok, boolean_t large_block_ok, boolean_t compressok,
 962      -    int outfd, vnode_t *vp, offset_t *off)
     1036 +    int outfd, vnode_t *vp, offset_t *off, boolean_t sendsize)
 963 1037  {
 964 1038          dsl_pool_t *dp;
 965 1039          dsl_dataset_t *ds;
 966 1040          dsl_dataset_t *fromds = NULL;
 967 1041          int err;
 968 1042  
 969 1043          err = dsl_pool_hold(pool, FTAG, &dp);
 970 1044          if (err != 0)
 971 1045                  return (err);
 972 1046  
↓ open down ↓ 14 lines elided ↑ open up ↑
 987 1061                          return (err);
 988 1062                  }
 989 1063                  if (!dsl_dataset_is_before(ds, fromds, 0))
 990 1064                          err = SET_ERROR(EXDEV);
 991 1065                  zb.zbm_creation_time =
 992 1066                      dsl_dataset_phys(fromds)->ds_creation_time;
 993 1067                  zb.zbm_creation_txg = dsl_dataset_phys(fromds)->ds_creation_txg;
 994 1068                  zb.zbm_guid = dsl_dataset_phys(fromds)->ds_guid;
 995 1069                  is_clone = (fromds->ds_dir != ds->ds_dir);
 996 1070                  dsl_dataset_rele(fromds, FTAG);
 997      -                err = dmu_send_impl(FTAG, dp, ds, &zb, is_clone,
 998      -                    embedok, large_block_ok, compressok, outfd, 0, 0, vp, off);
     1071 +                err = dmu_send_impl_ss(FTAG, dp, ds, &zb, is_clone,
     1072 +                    embedok, large_block_ok, compressok, outfd, 0, 0, vp, off,
     1073 +                        sendsize, NULL);
 999 1074          } else {
1000      -                err = dmu_send_impl(FTAG, dp, ds, NULL, B_FALSE,
1001      -                    embedok, large_block_ok, compressok, outfd, 0, 0, vp, off);
     1075 +                err = dmu_send_impl_ss(FTAG, dp, ds, NULL, B_FALSE,
     1076 +                    embedok, large_block_ok, compressok, outfd, 0, 0, vp, off,
     1077 +                        sendsize, NULL);
1002 1078          }
1003 1079          dsl_dataset_rele(ds, FTAG);
1004 1080          return (err);
1005 1081  }
1006 1082  
1007 1083  int
1008 1084  dmu_send(const char *tosnap, const char *fromsnap, boolean_t embedok,
1009 1085      boolean_t large_block_ok, boolean_t compressok, int outfd,
1010 1086      uint64_t resumeobj, uint64_t resumeoff,
1011 1087      vnode_t *vp, offset_t *off)
↓ open down ↓ 56 lines elided ↑ open up ↑
1068 1144                          }
1069 1145                  } else {
1070 1146                          err = dsl_bookmark_lookup(dp, fromsnap, ds, &zb);
1071 1147                  }
1072 1148                  if (err != 0) {
1073 1149                          dsl_dataset_rele(ds, FTAG);
1074 1150                          dsl_pool_rele(dp, FTAG);
1075 1151                          return (err);
1076 1152                  }
1077 1153                  err = dmu_send_impl(FTAG, dp, ds, &zb, is_clone,
1078      -                    embedok, large_block_ok, compressok,
1079      -                    outfd, resumeobj, resumeoff, vp, off);
     1154 +                    embedok, large_block_ok, compressok, outfd,
     1155 +                    resumeobj, resumeoff, vp, off, NULL);
1080 1156          } else {
1081 1157                  err = dmu_send_impl(FTAG, dp, ds, NULL, B_FALSE,
1082      -                    embedok, large_block_ok, compressok,
1083      -                    outfd, resumeobj, resumeoff, vp, off);
     1158 +                    embedok, large_block_ok, compressok, outfd,
     1159 +                    resumeobj, resumeoff, vp, off, NULL);
1084 1160          }
1085 1161          if (owned)
1086 1162                  dsl_dataset_disown(ds, FTAG);
1087 1163          else
1088 1164                  dsl_dataset_rele(ds, FTAG);
1089 1165          return (err);
1090 1166  }
1091 1167  
1092 1168  static int
1093 1169  dmu_adjust_send_estimate_for_indirects(dsl_dataset_t *ds, uint64_t uncompressed,
↓ open down ↓ 156 lines elided ↑ open up ↑
1250 1326  
1251 1327  typedef struct dmu_recv_begin_arg {
1252 1328          const char *drba_origin;
1253 1329          dmu_recv_cookie_t *drba_cookie;
1254 1330          cred_t *drba_cred;
1255 1331          uint64_t drba_snapobj;
1256 1332  } dmu_recv_begin_arg_t;
1257 1333  
1258 1334  static int
1259 1335  recv_begin_check_existing_impl(dmu_recv_begin_arg_t *drba, dsl_dataset_t *ds,
1260      -    uint64_t fromguid)
     1336 +    uint64_t fromguid, dmu_tx_t *tx)
1261 1337  {
1262 1338          uint64_t val;
1263 1339          int error;
1264 1340          dsl_pool_t *dp = ds->ds_dir->dd_pool;
1265 1341  
1266      -        /* temporary clone name must not exist */
1267      -        error = zap_lookup(dp->dp_meta_objset,
1268      -            dsl_dir_phys(ds->ds_dir)->dd_child_dir_zapobj, recv_clone_name,
1269      -            8, 1, &val);
1270      -        if (error != ENOENT)
1271      -                return (error == 0 ? EBUSY : error);
     1342 +        if (dmu_tx_is_syncing(tx)) {
     1343 +                /* temporary clone name must not exist */
     1344 +                error = zap_lookup(dp->dp_meta_objset,
     1345 +                    dsl_dir_phys(ds->ds_dir)->dd_child_dir_zapobj,
     1346 +                    recv_clone_name, 8, 1, &val);
     1347 +                if (error == 0) {
     1348 +                        dsl_dataset_t *tds;
1272 1349  
     1350 +                        /* check that if it is currently used */
     1351 +                        error = dsl_dataset_own_obj(dp, val, FTAG, &tds);
     1352 +                        if (!error) {
     1353 +                                char name[ZFS_MAX_DATASET_NAME_LEN];
     1354 +
     1355 +                                dsl_dataset_name(tds, name);
     1356 +                                dsl_dataset_disown(tds, FTAG);
     1357 +
     1358 +                                error = dsl_dataset_hold(dp, name, FTAG, &tds);
     1359 +                                if (!error) {
     1360 +                                        dsl_destroy_head_sync_impl(tds, tx);
     1361 +                                        dsl_dataset_rele(tds, FTAG);
     1362 +                                        error = ENOENT;
     1363 +                                }
     1364 +                        } else {
     1365 +                                error = 0;
     1366 +                        }
     1367 +                }
     1368 +                if (error != ENOENT) {
     1369 +                        return (error == 0 ?
     1370 +                            SET_ERROR(EBUSY) : SET_ERROR(error));
     1371 +                }
     1372 +        }
     1373 +
1273 1374          /* new snapshot name must not exist */
1274 1375          error = zap_lookup(dp->dp_meta_objset,
1275 1376              dsl_dataset_phys(ds)->ds_snapnames_zapobj,
1276 1377              drba->drba_cookie->drc_tosnap, 8, 1, &val);
1277 1378          if (error != ENOENT)
1278      -                return (error == 0 ? EEXIST : error);
     1379 +                return (error == 0 ? SET_ERROR(EEXIST) : SET_ERROR(error));
1279 1380  
1280 1381          /*
1281 1382           * Check snapshot limit before receiving. We'll recheck again at the
1282 1383           * end, but might as well abort before receiving if we're already over
1283 1384           * the limit.
1284 1385           *
1285 1386           * Note that we do not check the file system limit with
1286 1387           * dsl_dir_fscount_check because the temporary %clones don't count
1287 1388           * against that limit.
1288 1389           */
↓ open down ↓ 103 lines elided ↑ open up ↑
1392 1493           * feature enabled if the stream has LARGE_BLOCKS.
1393 1494           */
1394 1495          if ((featureflags & DMU_BACKUP_FEATURE_LARGE_BLOCKS) &&
1395 1496              !spa_feature_is_enabled(dp->dp_spa, SPA_FEATURE_LARGE_BLOCKS))
1396 1497                  return (SET_ERROR(ENOTSUP));
1397 1498  
1398 1499          error = dsl_dataset_hold(dp, tofs, FTAG, &ds);
1399 1500          if (error == 0) {
1400 1501                  /* target fs already exists; recv into temp clone */
1401 1502  
     1503 +                if (spa_feature_is_active(dp->dp_spa, SPA_FEATURE_WBC)) {
     1504 +                        objset_t *os = NULL;
     1505 +
     1506 +                        error = dmu_objset_from_ds(ds, &os);
     1507 +                        if (error) {
     1508 +                                dsl_dataset_rele(ds, FTAG);
     1509 +                                return (error);
     1510 +                        }
     1511 +
     1512 +                        /* Recv is impossible into DS that uses WBC */
     1513 +                        if (os->os_wbc_mode != ZFS_WBC_MODE_OFF) {
     1514 +                                dsl_dataset_rele(ds, FTAG);
     1515 +                                return (SET_ERROR(EKZFS_WBCNOTSUP));
     1516 +                        }
     1517 +                }
     1518 +
1402 1519                  /* Can't recv a clone into an existing fs */
1403 1520                  if (flags & DRR_FLAG_CLONE || drba->drba_origin) {
1404 1521                          dsl_dataset_rele(ds, FTAG);
1405 1522                          return (SET_ERROR(EINVAL));
1406 1523                  }
1407 1524  
1408      -                error = recv_begin_check_existing_impl(drba, ds, fromguid);
     1525 +                error = recv_begin_check_existing_impl(drba, ds, fromguid, tx);
1409 1526                  dsl_dataset_rele(ds, FTAG);
1410 1527          } else if (error == ENOENT) {
1411 1528                  /* target fs does not exist; must be a full backup or clone */
1412 1529                  char buf[ZFS_MAX_DATASET_NAME_LEN];
1413 1530  
1414 1531                  /*
1415 1532                   * If it's a non-clone incremental, we are missing the
1416 1533                   * target fs, so fail the recv.
1417 1534                   */
1418 1535                  if (fromguid != 0 && !(flags & DRR_FLAG_CLONE ||
↓ open down ↓ 9 lines elided ↑ open up ↑
1428 1545                      !(flags & DRR_FLAG_FREERECORDS))
1429 1546                          return (SET_ERROR(EINVAL));
1430 1547  
1431 1548                  /* Open the parent of tofs */
1432 1549                  ASSERT3U(strlen(tofs), <, sizeof (buf));
1433 1550                  (void) strlcpy(buf, tofs, strrchr(tofs, '/') - tofs + 1);
1434 1551                  error = dsl_dataset_hold(dp, buf, FTAG, &ds);
1435 1552                  if (error != 0)
1436 1553                          return (error);
1437 1554  
     1555 +                if (spa_feature_is_active(dp->dp_spa, SPA_FEATURE_WBC)) {
     1556 +                        objset_t *os = NULL;
     1557 +
     1558 +                        error = dmu_objset_from_ds(ds, &os);
     1559 +                        if (error) {
     1560 +                                dsl_dataset_rele(ds, FTAG);
     1561 +                                return (error);
     1562 +                        }
     1563 +
     1564 +                        /* Recv is impossible into DS that uses WBC */
     1565 +                        if (os->os_wbc_mode != ZFS_WBC_MODE_OFF) {
     1566 +                                dsl_dataset_rele(ds, FTAG);
     1567 +                                return (SET_ERROR(EKZFS_WBCNOTSUP));
     1568 +                        }
     1569 +                }
     1570 +
1438 1571                  /*
1439 1572                   * Check filesystem and snapshot limits before receiving. We'll
1440 1573                   * recheck snapshot limits again at the end (we create the
1441 1574                   * filesystems and increment those counts during begin_sync).
1442 1575                   */
1443 1576                  error = dsl_fs_ss_limit_check(ds->ds_dir, 1,
1444 1577                      ZFS_PROP_FILESYSTEM_LIMIT, NULL, drba->drba_cred);
1445 1578                  if (error != 0) {
1446 1579                          dsl_dataset_rele(ds, FTAG);
1447 1580                          return (error);
↓ open down ↓ 194 lines elided ↑ open up ↑
1642 1775          if (!DS_IS_INCONSISTENT(ds)) {
1643 1776                  dsl_dataset_rele(ds, FTAG);
1644 1777                  return (SET_ERROR(EINVAL));
1645 1778          }
1646 1779  
1647 1780          /* check that there is resuming data, and that the toguid matches */
1648 1781          if (!dsl_dataset_is_zapified(ds)) {
1649 1782                  dsl_dataset_rele(ds, FTAG);
1650 1783                  return (SET_ERROR(EINVAL));
1651 1784          }
1652      -        uint64_t val;
     1785 +        uint64_t val = 0;
1653 1786          error = zap_lookup(dp->dp_meta_objset, ds->ds_object,
1654 1787              DS_FIELD_RESUME_TOGUID, sizeof (val), 1, &val);
1655 1788          if (error != 0 || drrb->drr_toguid != val) {
1656 1789                  dsl_dataset_rele(ds, FTAG);
1657 1790                  return (SET_ERROR(EINVAL));
1658 1791          }
1659 1792  
1660 1793          /*
1661 1794           * Check if the receive is still running.  If so, it will be owned.
1662 1795           * Note that nothing else can own the dataset (e.g. after the receive
↓ open down ↓ 68 lines elided ↑ open up ↑
1731 1864  
1732 1865          spa_history_log_internal_ds(ds, "resume receive", tx, "");
1733 1866  }
1734 1867  
1735 1868  /*
1736 1869   * NB: callers *MUST* call dmu_recv_stream() if dmu_recv_begin()
1737 1870   * succeeds; otherwise we will leak the holds on the datasets.
1738 1871   */
1739 1872  int
1740 1873  dmu_recv_begin(char *tofs, char *tosnap, dmu_replay_record_t *drr_begin,
1741      -    boolean_t force, boolean_t resumable, char *origin, dmu_recv_cookie_t *drc)
     1874 +    boolean_t force, boolean_t resumable, boolean_t force_cksum,
     1875 +    char *origin, dmu_recv_cookie_t *drc)
1742 1876  {
1743 1877          dmu_recv_begin_arg_t drba = { 0 };
1744 1878  
1745 1879          bzero(drc, sizeof (dmu_recv_cookie_t));
1746 1880          drc->drc_drr_begin = drr_begin;
1747 1881          drc->drc_drrb = &drr_begin->drr_u.drr_begin;
1748 1882          drc->drc_tosnap = tosnap;
1749 1883          drc->drc_tofs = tofs;
1750 1884          drc->drc_force = force;
1751 1885          drc->drc_resumable = resumable;
1752 1886          drc->drc_cred = CRED();
1753 1887  
1754 1888          if (drc->drc_drrb->drr_magic == BSWAP_64(DMU_BACKUP_MAGIC)) {
1755 1889                  drc->drc_byteswap = B_TRUE;
1756      -                (void) fletcher_4_incremental_byteswap(drr_begin,
1757      -                    sizeof (dmu_replay_record_t), &drc->drc_cksum);
1758      -                byteswap_record(drr_begin);
     1890 +
     1891 +                /* on-wire checksum can be disabled for krrp */
     1892 +                if (force_cksum) {
     1893 +                        (void) fletcher_4_incremental_byteswap(drr_begin,
     1894 +                            sizeof (dmu_replay_record_t), &drc->drc_cksum);
     1895 +                        byteswap_record(drr_begin);
     1896 +                }
1759 1897          } else if (drc->drc_drrb->drr_magic == DMU_BACKUP_MAGIC) {
1760      -                (void) fletcher_4_incremental_native(drr_begin,
1761      -                    sizeof (dmu_replay_record_t), &drc->drc_cksum);
     1898 +                /* on-wire checksum can be disabled for krrp */
     1899 +                if (force_cksum) {
     1900 +                        (void) fletcher_4_incremental_native(drr_begin,
     1901 +                            sizeof (dmu_replay_record_t), &drc->drc_cksum);
     1902 +                }
1762 1903          } else {
1763 1904                  return (SET_ERROR(EINVAL));
1764 1905          }
1765 1906  
1766 1907          drba.drba_origin = origin;
1767 1908          drba.drba_cookie = drc;
1768 1909          drba.drba_cred = CRED();
1769 1910  
1770 1911          if (DMU_GET_FEATUREFLAGS(drc->drc_drrb->drr_versioninfo) &
1771 1912              DMU_BACKUP_FEATURE_RESUMING) {
↓ open down ↓ 63 lines elided ↑ open up ↑
1835 1976          uint64_t bytes_read;
1836 1977          /*
1837 1978           * A record that has had its payload read in, but hasn't yet been handed
1838 1979           * off to the worker thread.
1839 1980           */
1840 1981          struct receive_record_arg *rrd;
1841 1982          /* A record that has had its header read in, but not its payload. */
1842 1983          struct receive_record_arg *next_rrd;
1843 1984          zio_cksum_t cksum;
1844 1985          zio_cksum_t prev_cksum;
     1986 +        dmu_krrp_task_t *krrp_task;
1845 1987          int err;
1846 1988          boolean_t byteswap;
1847 1989          /* Sorted list of objects not to issue prefetches for. */
1848 1990          struct objlist ignore_objlist;
1849 1991  };
1850 1992  
1851 1993  typedef struct guid_map_entry {
1852 1994          uint64_t        guid;
1853 1995          dsl_dataset_t   *gme_ds;
1854 1996          avl_node_t      avlnode;
↓ open down ↓ 32 lines elided ↑ open up ↑
1887 2029  receive_read(struct receive_arg *ra, int len, void *buf)
1888 2030  {
1889 2031          int done = 0;
1890 2032  
1891 2033          /*
1892 2034           * The code doesn't rely on this (lengths being multiples of 8).  See
1893 2035           * comment in dump_bytes.
1894 2036           */
1895 2037          ASSERT0(len % 8);
1896 2038  
1897      -        while (done < len) {
1898      -                ssize_t resid;
     2039 +        /*
     2040 +         * if vp is NULL, then the send is from krrp and we can try to bypass
     2041 +         * copying data to an intermediate buffer.
     2042 +         */
     2043 +        if (ra->vp != NULL) {
     2044 +                while (done < len) {
     2045 +                        ssize_t resid = 0;
1899 2046  
1900      -                ra->err = vn_rdwr(UIO_READ, ra->vp,
1901      -                    (char *)buf + done, len - done,
1902      -                    ra->voff, UIO_SYSSPACE, FAPPEND,
1903      -                    RLIM64_INFINITY, CRED(), &resid);
1904      -
1905      -                if (resid == len - done) {
1906      -                        /*
1907      -                         * Note: ECKSUM indicates that the receive
1908      -                         * was interrupted and can potentially be resumed.
1909      -                         */
1910      -                        ra->err = SET_ERROR(ECKSUM);
     2047 +                        ra->err = vn_rdwr(UIO_READ, ra->vp,
     2048 +                            (char *)buf + done, len - done,
     2049 +                            ra->voff, UIO_SYSSPACE, FAPPEND,
     2050 +                            RLIM64_INFINITY, CRED(), &resid);
     2051 +                        if (resid == len - done) {
     2052 +                                /*
     2053 +                                 * Note: ECKSUM indicates that the receive was
     2054 +                                 * interrupted and can potentially be resumed.
     2055 +                                 */
     2056 +                                ra->err = SET_ERROR(ECKSUM);
     2057 +                        }
     2058 +                        ra->voff += len - done - resid;
     2059 +                        done = len - resid;
     2060 +                        if (ra->err != 0)
     2061 +                                return (ra->err);
1911 2062                  }
1912      -                ra->voff += len - done - resid;
1913      -                done = len - resid;
     2063 +        } else {
     2064 +                ASSERT(ra->krrp_task != NULL);
     2065 +                ra->err = dmu_krrp_buffer_read(buf, len, ra->krrp_task);
1914 2066                  if (ra->err != 0)
1915 2067                          return (ra->err);
     2068 +
     2069 +                done = len;
1916 2070          }
1917 2071  
1918 2072          ra->bytes_read += len;
1919 2073  
1920 2074          ASSERT3U(done, ==, len);
1921 2075          return (0);
1922 2076  }
1923 2077  
1924 2078  static void
1925 2079  byteswap_record(dmu_replay_record_t *drr)
↓ open down ↓ 285 lines elided ↑ open up ↑
2211 2365  
2212 2366          tx = dmu_tx_create(rwa->os);
2213 2367  
2214 2368          dmu_tx_hold_write(tx, drrw->drr_object,
2215 2369              drrw->drr_offset, drrw->drr_logical_size);
2216 2370          err = dmu_tx_assign(tx, TXG_WAIT);
2217 2371          if (err != 0) {
2218 2372                  dmu_tx_abort(tx);
2219 2373                  return (err);
2220 2374          }
     2375 +
2221 2376          if (rwa->byteswap) {
2222 2377                  dmu_object_byteswap_t byteswap =
2223 2378                      DMU_OT_BYTESWAP(drrw->drr_type);
2224 2379                  dmu_ot_byteswap[byteswap].ob_func(abuf->b_data,
2225 2380                      DRR_WRITE_PAYLOAD_SIZE(drrw));
2226 2381          }
2227 2382  
2228 2383          /* use the bonus buf to look up the dnode in dmu_assign_arcbuf */
2229 2384          dmu_buf_t *bonus;
2230 2385          if (dmu_bonus_hold(rwa->os, drrw->drr_object, FTAG, &bonus) != 0)
↓ open down ↓ 209 lines elided ↑ open up ↑
2440 2595   * Read the payload into a buffer of size len, and update the current record's
2441 2596   * payload field.
2442 2597   * Allocate ra->next_rrd and read the next record's header into
2443 2598   * ra->next_rrd->header.
2444 2599   * Verify checksum of payload and next record.
2445 2600   */
2446 2601  static int
2447 2602  receive_read_payload_and_next_header(struct receive_arg *ra, int len, void *buf)
2448 2603  {
2449 2604          int err;
     2605 +        boolean_t checksum_enable = (ra->krrp_task == NULL ||
     2606 +            ra->krrp_task->buffer_args.force_cksum);
2450 2607  
2451 2608          if (len != 0) {
2452 2609                  ASSERT3U(len, <=, SPA_MAXBLOCKSIZE);
2453 2610                  err = receive_read(ra, len, buf);
2454 2611                  if (err != 0)
2455 2612                          return (err);
2456 2613                  receive_cksum(ra, len, buf);
2457 2614  
2458 2615                  /* note: rrd is NULL when reading the begin record's payload */
2459 2616                  if (ra->rrd != NULL) {
↓ open down ↓ 13 lines elided ↑ open up ↑
2473 2630                  kmem_free(ra->next_rrd, sizeof (*ra->next_rrd));
2474 2631                  ra->next_rrd = NULL;
2475 2632                  return (err);
2476 2633          }
2477 2634          if (ra->next_rrd->header.drr_type == DRR_BEGIN) {
2478 2635                  kmem_free(ra->next_rrd, sizeof (*ra->next_rrd));
2479 2636                  ra->next_rrd = NULL;
2480 2637                  return (SET_ERROR(EINVAL));
2481 2638          }
2482 2639  
2483      -        /*
2484      -         * Note: checksum is of everything up to but not including the
2485      -         * checksum itself.
2486      -         */
2487      -        ASSERT3U(offsetof(dmu_replay_record_t, drr_u.drr_checksum.drr_checksum),
2488      -            ==, sizeof (dmu_replay_record_t) - sizeof (zio_cksum_t));
2489      -        receive_cksum(ra,
2490      -            offsetof(dmu_replay_record_t, drr_u.drr_checksum.drr_checksum),
2491      -            &ra->next_rrd->header);
     2640 +        if (checksum_enable) {
     2641 +                /*
     2642 +                 * Note: checksum is of everything up to but not including the
     2643 +                 * checksum itself.
     2644 +                 */
     2645 +                ASSERT3U(offsetof(dmu_replay_record_t,
     2646 +                    drr_u.drr_checksum.drr_checksum),
     2647 +                    ==, sizeof (dmu_replay_record_t) - sizeof (zio_cksum_t));
     2648 +                receive_cksum(ra,
     2649 +                    offsetof(dmu_replay_record_t,
     2650 +                    drr_u.drr_checksum.drr_checksum),
     2651 +                    &ra->next_rrd->header);
2492 2652  
2493      -        zio_cksum_t cksum_orig =
2494      -            ra->next_rrd->header.drr_u.drr_checksum.drr_checksum;
2495      -        zio_cksum_t *cksump =
2496      -            &ra->next_rrd->header.drr_u.drr_checksum.drr_checksum;
     2653 +                zio_cksum_t cksum_orig =
     2654 +                    ra->next_rrd->header.drr_u.drr_checksum.drr_checksum;
     2655 +                zio_cksum_t *cksump =
     2656 +                    &ra->next_rrd->header.drr_u.drr_checksum.drr_checksum;
2497 2657  
2498      -        if (ra->byteswap)
2499      -                byteswap_record(&ra->next_rrd->header);
     2658 +                if (ra->byteswap)
     2659 +                        byteswap_record(&ra->next_rrd->header);
2500 2660  
2501      -        if ((!ZIO_CHECKSUM_IS_ZERO(cksump)) &&
2502      -            !ZIO_CHECKSUM_EQUAL(ra->cksum, *cksump)) {
2503      -                kmem_free(ra->next_rrd, sizeof (*ra->next_rrd));
2504      -                ra->next_rrd = NULL;
2505      -                return (SET_ERROR(ECKSUM));
     2661 +                if ((!ZIO_CHECKSUM_IS_ZERO(cksump)) &&
     2662 +                    !ZIO_CHECKSUM_EQUAL(ra->cksum, *cksump)) {
     2663 +                        kmem_free(ra->next_rrd, sizeof (*ra->next_rrd));
     2664 +                        ra->next_rrd = NULL;
     2665 +                        return (SET_ERROR(ECKSUM));
     2666 +                }
     2667 +
     2668 +                receive_cksum(ra, sizeof (cksum_orig), &cksum_orig);
2506 2669          }
2507 2670  
2508      -        receive_cksum(ra, sizeof (cksum_orig), &cksum_orig);
2509      -
2510 2671          return (0);
2511 2672  }
2512 2673  
2513 2674  static void
2514 2675  objlist_create(struct objlist *list)
2515 2676  {
2516 2677          list_create(&list->list, sizeof (struct receive_objnode),
2517 2678              offsetof(struct receive_objnode, node));
2518 2679          list->last_lookup = 0;
2519 2680  }
↓ open down ↓ 175 lines elided ↑ open up ↑
2695 2856          {
2696 2857                  /*
2697 2858                   * It might be beneficial to prefetch indirect blocks here, but
2698 2859                   * we don't really have the data to decide for sure.
2699 2860                   */
2700 2861                  err = receive_read_payload_and_next_header(ra, 0, NULL);
2701 2862                  return (err);
2702 2863          }
2703 2864          case DRR_END:
2704 2865          {
2705      -                struct drr_end *drre = &ra->rrd->header.drr_u.drr_end;
2706      -                if (!ZIO_CHECKSUM_EQUAL(ra->prev_cksum, drre->drr_checksum))
2707      -                        return (SET_ERROR(ECKSUM));
     2866 +                if (ra->krrp_task == NULL ||
     2867 +                    ra->krrp_task->buffer_args.force_cksum) {
     2868 +                        struct drr_end *drre = &ra->rrd->header.drr_u.drr_end;
     2869 +                        if (!ZIO_CHECKSUM_EQUAL(ra->prev_cksum,
     2870 +                            drre->drr_checksum))
     2871 +                                return (SET_ERROR(ECKSUM));
     2872 +                }
2708 2873                  return (0);
2709 2874          }
2710 2875          case DRR_SPILL:
2711 2876          {
2712 2877                  struct drr_spill *drrs = &ra->rrd->header.drr_u.drr_spill;
2713 2878                  void *buf = kmem_zalloc(drrs->drr_length, KM_SLEEP);
2714 2879                  err = receive_read_payload_and_next_header(ra, drrs->drr_length,
2715 2880                      buf);
2716 2881                  if (err != 0)
2717 2882                          kmem_free(buf, drrs->drr_length);
↓ open down ↓ 145 lines elided ↑ open up ↑
2863 3028   * prefetches for any necessary indirect blocks.  It will then push the records
2864 3029   * onto an internal blocking queue.  The worker thread will pull the records off
2865 3030   * the queue, and actually write the data into the DMU.  This way, the worker
2866 3031   * thread doesn't have to wait for reads to complete, since everything it needs
2867 3032   * (the indirect blocks) will be prefetched.
2868 3033   *
2869 3034   * NB: callers *must* call dmu_recv_end() if this succeeds.
2870 3035   */
2871 3036  int
2872 3037  dmu_recv_stream(dmu_recv_cookie_t *drc, vnode_t *vp, offset_t *voffp,
2873      -    int cleanup_fd, uint64_t *action_handlep)
     3038 +    int cleanup_fd, uint64_t *action_handlep, dmu_krrp_task_t *krrp_task)
2874 3039  {
2875 3040          int err = 0;
2876 3041          struct receive_arg ra = { 0 };
2877 3042          struct receive_writer_arg rwa = { 0 };
2878 3043          int featureflags;
2879 3044          nvlist_t *begin_nvl = NULL;
2880 3045  
2881 3046          ra.byteswap = drc->drc_byteswap;
2882 3047          ra.cksum = drc->drc_cksum;
2883 3048          ra.vp = vp;
2884 3049          ra.voff = *voffp;
     3050 +        ra.krrp_task = krrp_task;
2885 3051  
2886 3052          if (dsl_dataset_is_zapified(drc->drc_ds)) {
2887 3053                  (void) zap_lookup(drc->drc_ds->ds_dir->dd_pool->dp_meta_objset,
2888 3054                      drc->drc_ds->ds_object, DS_FIELD_RESUME_BYTES,
2889 3055                      sizeof (ra.bytes_read), 1, &ra.bytes_read);
2890 3056          }
2891 3057  
2892 3058          objlist_create(&ra.ignore_objlist);
2893 3059  
2894 3060          /* these were verified in dmu_recv_begin */
↓ open down ↓ 88 lines elided ↑ open up ↑
2983 3149           *
2984 3150           * We can leave this loop in 3 ways:  First, if rwa.err is
2985 3151           * non-zero.  In that case, the writer thread will free the rrd we just
2986 3152           * pushed.  Second, if  we're interrupted; in that case, either it's the
2987 3153           * first loop and ra.rrd was never allocated, or it's later, and ra.rrd
2988 3154           * has been handed off to the writer thread who will free it.  Finally,
2989 3155           * if receive_read_record fails or we're at the end of the stream, then
2990 3156           * we free ra.rrd and exit.
2991 3157           */
2992 3158          while (rwa.err == 0) {
2993      -                if (issig(JUSTLOOKING) && issig(FORREAL)) {
     3159 +                if (vp && issig(JUSTLOOKING) && issig(FORREAL)) {
2994 3160                          err = SET_ERROR(EINTR);
2995 3161                          break;
2996 3162                  }
2997 3163  
2998 3164                  ASSERT3P(ra.rrd, ==, NULL);
2999 3165                  ra.rrd = ra.next_rrd;
3000 3166                  ra.next_rrd = NULL;
3001 3167                  /* Allocates and loads header into ra.next_rrd */
3002 3168                  err = receive_read_record(&ra);
3003 3169  
↓ open down ↓ 45 lines elided ↑ open up ↑
3049 3215  
3050 3216  static int
3051 3217  dmu_recv_end_check(void *arg, dmu_tx_t *tx)
3052 3218  {
3053 3219          dmu_recv_cookie_t *drc = arg;
3054 3220          dsl_pool_t *dp = dmu_tx_pool(tx);
3055 3221          int error;
3056 3222  
3057 3223          ASSERT3P(drc->drc_ds->ds_owner, ==, dmu_recv_tag);
3058 3224  
     3225 +        if (spa_feature_is_active(dp->dp_spa, SPA_FEATURE_WBC)) {
     3226 +                objset_t *os = NULL;
     3227 +
     3228 +                error  = dmu_objset_from_ds(drc->drc_ds, &os);
     3229 +                if (error)
     3230 +                        return (error);
     3231 +
     3232 +                /* Recv is impossible into DS that uses WBC */
     3233 +                if (os->os_wbc_mode != ZFS_WBC_MODE_OFF)
     3234 +                        return (SET_ERROR(EKZFS_WBCNOTSUP));
     3235 +        }
     3236 +
3059 3237          if (!drc->drc_newfs) {
3060 3238                  dsl_dataset_t *origin_head;
3061 3239  
3062 3240                  error = dsl_dataset_hold(dp, drc->drc_tofs, FTAG, &origin_head);
3063 3241                  if (error != 0)
3064 3242                          return (error);
3065 3243                  if (drc->drc_force) {
3066 3244                          /*
3067 3245                           * We will destroy any snapshots in tofs (i.e. before
3068 3246                           * origin_head) that are after the origin (which is
↓ open down ↓ 36 lines elided ↑ open up ↑
3105 3283                      drc->drc_tosnap, tx, B_TRUE, 1, drc->drc_cred);
3106 3284                  dsl_dataset_rele(origin_head, FTAG);
3107 3285                  if (error != 0)
3108 3286                          return (error);
3109 3287  
3110 3288                  error = dsl_destroy_head_check_impl(drc->drc_ds, 1);
3111 3289          } else {
3112 3290                  error = dsl_dataset_snapshot_check_impl(drc->drc_ds,
3113 3291                      drc->drc_tosnap, tx, B_TRUE, 1, drc->drc_cred);
3114 3292          }
     3293 +
     3294 +        if (dmu_tx_is_syncing(tx) && drc->drc_krrp_task != NULL) {
     3295 +                const char *token =
     3296 +                    drc->drc_krrp_task->buffer_args.to_ds;
     3297 +                const char *cookie = drc->drc_krrp_task->cookie;
     3298 +                dsl_pool_t *dp = tx->tx_pool;
     3299 +
     3300 +                if (*token != '\0') {
     3301 +                        error = zap_update(dp->dp_meta_objset,
     3302 +                            DMU_POOL_DIRECTORY_OBJECT, token, 1,
     3303 +                            strlen(cookie) + 1, cookie, tx);
     3304 +                }
     3305 +        }
3115 3306          return (error);
3116 3307  }
3117 3308  
3118 3309  static void
3119 3310  dmu_recv_end_sync(void *arg, dmu_tx_t *tx)
3120 3311  {
3121 3312          dmu_recv_cookie_t *drc = arg;
3122 3313          dsl_pool_t *dp = dmu_tx_pool(tx);
3123 3314  
3124 3315          spa_history_log_internal_ds(drc->drc_ds, "finish receiving",
↓ open down ↓ 186 lines elided ↑ open up ↑
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX