Print this page
NEX-15281 zfs_panic_recover() during hpr disable/enable
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
NEX-15281 zfs_panic_recover() during hpr disable/enable
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
NEX-13629 zfs send -s: assertion failed: err != 0 || (dsp->dsa_sent_begin && dsp->dsa_sent_end), file: ../../common/fs/zfs/dmu_send.c, line: 1010
Reviewed by: Alex Deiter <alex.deiter@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
NEX-9752 backport illumos 6950 ARC should cache compressed data
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
6950 ARC should cache compressed data
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Don Brady <don.brady@intel.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
NEX-9575 zfs send -s panics
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Revert "NEX-7251 Resume_token is not cleared right after finishing receive"
This reverts commit 9e97a45e8cf6ca59307a39e2d3c11c6e845e4187.
NEX-7251 Resume_token is not cleared right after finishing receive
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Alexey Komarov <alexey.komarov@nexenta.com>
NEX-5928 KRRP: Integrate illumos/openzfs resume-token, to resume replication from a given synced offset
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Alexey Komarov <alexey.komarov@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
NEX-5795 Rename 'wrc' as 'wbc' in the source and in the tech docs
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
NEX-5272 KRRP: replicate snapshot properties
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Alexey Komarov <alexey.komarov@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
NEX-5270 WBC: Incorrect error message when trying to 'zfs recv' into wrcached dataset
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
NEX-5132 WBC: Do not allow recv to datasets with enabled writecache
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
6358 A faulted pool with only unavailable vdevs triggers assertion failure in libzfs
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Reviewed by: Serban Maduta <serban.maduta@gmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>
6393 zfs receive a full send as a clone
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Approved by: Dan McDonald <danmcd@omniti.com>
2605 want to resume interrupted zfs send
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Reviewed by: Xin Li <delphij@freebsd.org>
Reviewed by: Arne Jansen <sensille@gmx.net>
Approved by: Dan McDonald <danmcd@omniti.com>
4185 add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R (fix studio build)
4185 add new cryptographic checksums to ZFS: SHA-512, Skein, Edon-R
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Richard Lowe <richlowe@richlowe.net>
Approved by: Garrett D'Amore <garrett@damore.org>
6047 SPARC boot should support feature@embedded_data
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>
5959 clean up per-dataset feature count code
Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
NEX-4582 update wrc test cases for allow to use write back cache per tree of datasets
Reviewed by: Steve Peng <steve.peng@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
5960 zfs recv should prefetch indirect blocks
5925 zfs receive -o origin=
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
5946 zfs_ioc_space_snaps must check that firstsnap and lastsnap refer to snapshots
5945 zfs_ioc_send_space must ensure that fromsnap refers to a snapshot
Reviewed by: Steven Hartland <killing@multiplay.co.uk>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Gordon Ross <gordon.ross@nexenta.com>
5870 dmu_recv_end_check() leaks origin_head hold if error happens in drc_force branch
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Andrew Stormont <andyjstormont@gmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>
5912 full stream can not be force-received into a dataset if it has a snapshot
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Dan McDonald <danmcd@omniti.com>
5809 Blowaway full receive in v1 pool causes kernel panic
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: Will Andrews <will@freebsd.org>
Approved by: Gordon Ross <gwr@nexenta.com>
5746 more checksumming in zfs send
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com>
Approved by: Albert Lee <trisk@omniti.com>
5765 add support for estimating send stream size with lzc_send_space when source is a bookmark
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Steven Hartland <killing@multiplay.co.uk>
Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com>
Approved by: Albert Lee <trisk@nexenta.com>
5769 Cast 'zfs bad bloc' to ULL for x86
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <paul.dagnelie@delphix.com>
Reviewed by: Richard PALO <richard@NetBSD.org>
Approved by: Dan McDonald <danmcd@omniti.com>
NEX-4476 WRC: Allow to use write back cache per tree of datasets
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
Revert "NEX-4476 WRC: Allow to use write back cache per tree of datasets"
This reverts commit fe97b74444278a6f36fec93179133641296312da.
NEX-4476 WRC: Allow to use write back cache per tree of datasets
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
NEX-3588 krrp panics in zfs:dmu_recv_end_check+13b () when running zfs tests.
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Kevin Crowe <kevin.crowe@nexenta.com>
NEX-3558 KRRP Integration
4370 avoid transmitting holes during zfs send
4371 DMU code clean up
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net>
Approved by: Garrett D'Amore <garrett@damore.org>
Fixup merge results
re #12619 rb4429 More dp->dp_config_rwlock holds
Bug 10481 - Dry run option in 'zfs send' isn't the same as in NexentaStor 3.1

@@ -18,14 +18,14 @@
  *
  * CDDL HEADER END
  */
 /*
  * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
- * Copyright 2011 Nexenta Systems, Inc. All rights reserved.
  * Copyright (c) 2011, 2015 by Delphix. All rights reserved.
  * Copyright (c) 2014, Joyent, Inc. All rights reserved.
  * Copyright 2014 HybridCluster. All rights reserved.
+ * Copyright 2017 Nexenta Systems, Inc. All rights reserved.
  * Copyright 2016 RackTop Systems.
  * Copyright (c) 2014 Integros [integros.com]
  */
 
 #include <sys/dmu.h>

@@ -52,12 +52,15 @@
 #include <sys/dmu_send.h>
 #include <sys/dsl_destroy.h>
 #include <sys/blkptr.h>
 #include <sys/dsl_bookmark.h>
 #include <sys/zfeature.h>
+#include <sys/autosnap.h>
 #include <sys/bqueue.h>
 
+#include "zfs_errno.h"
+
 /* Set this tunable to TRUE to replace corrupt data with 0x2f5baddb10c */
 int zfs_send_corrupt_data = B_FALSE;
 int zfs_send_queue_length = 16 * 1024 * 1024;
 int zfs_recv_queue_length = 16 * 1024 * 1024;
 /* Set this tunable to FALSE to disable setting of DRR_FLAG_FREERECORDS */

@@ -108,54 +111,87 @@
          * data that isn't 8-byte aligned; if the assertions were removed, a
          * feature flag would have to be added.
          */
 
         ASSERT0(len % 8);
+        ASSERT(buf != NULL);
 
+        dsp->dsa_err = 0;
+        if (!dsp->sendsize) {
+                /* if vp is NULL, then the send is from krrp */
+                if (dsp->dsa_vp != NULL) {
         dsp->dsa_err = vn_rdwr(UIO_WRITE, dsp->dsa_vp,
             (caddr_t)buf, len,
-            0, UIO_SYSSPACE, FAPPEND, RLIM64_INFINITY, CRED(), &resid);
-
+                            0, UIO_SYSSPACE, FAPPEND, RLIM64_INFINITY,
+                            CRED(), &resid);
+                } else {
+                        ASSERT(dsp->dsa_krrp_task != NULL);
+                        dsp->dsa_err = dmu_krrp_buffer_write(buf, len,
+                            dsp->dsa_krrp_task);
+                }
+        }
         mutex_enter(&ds->ds_sendstream_lock);
         *dsp->dsa_off += len;
         mutex_exit(&ds->ds_sendstream_lock);
 
         return (dsp->dsa_err);
 }
 
+static int
+dump_bytes_with_checksum(dmu_sendarg_t *dsp, void *buf, int len)
+{
+        if (!dsp->sendsize && (dsp->dsa_krrp_task == NULL ||
+            dsp->dsa_krrp_task->buffer_args.force_cksum)) {
+                (void) fletcher_4_incremental_native(buf, len, &dsp->dsa_zc);
+        }
+
+        return (dump_bytes(dsp, buf, len));
+}
+
 /*
  * For all record types except BEGIN, fill in the checksum (overlaid in
  * drr_u.drr_checksum.drr_checksum).  The checksum verifies everything
  * up to the start of the checksum itself.
  */
 static int
 dump_record(dmu_sendarg_t *dsp, void *payload, int payload_len)
 {
+        boolean_t do_checksum = (dsp->dsa_krrp_task == NULL ||
+            dsp->dsa_krrp_task->buffer_args.force_cksum);
+
         ASSERT3U(offsetof(dmu_replay_record_t, drr_u.drr_checksum.drr_checksum),
             ==, sizeof (dmu_replay_record_t) - sizeof (zio_cksum_t));
-        (void) fletcher_4_incremental_native(dsp->dsa_drr,
-            offsetof(dmu_replay_record_t, drr_u.drr_checksum.drr_checksum),
-            &dsp->dsa_zc);
+
         if (dsp->dsa_drr->drr_type == DRR_BEGIN) {
                 dsp->dsa_sent_begin = B_TRUE;
-        } else {
-                ASSERT(ZIO_CHECKSUM_IS_ZERO(&dsp->dsa_drr->drr_u.
-                    drr_checksum.drr_checksum));
-                dsp->dsa_drr->drr_u.drr_checksum.drr_checksum = dsp->dsa_zc;
         }
+
         if (dsp->dsa_drr->drr_type == DRR_END) {
                 dsp->dsa_sent_end = B_TRUE;
         }
+
+        if (!dsp->sendsize && do_checksum) {
+                (void) fletcher_4_incremental_native(dsp->dsa_drr,
+                    offsetof(dmu_replay_record_t,
+                    drr_u.drr_checksum.drr_checksum),
+                    &dsp->dsa_zc);
+                if (dsp->dsa_drr->drr_type != DRR_BEGIN) {
+                        ASSERT(ZIO_CHECKSUM_IS_ZERO(&dsp->dsa_drr->drr_u.
+                            drr_checksum.drr_checksum));
+                        dsp->dsa_drr->drr_u.drr_checksum.drr_checksum =
+                            dsp->dsa_zc;
+                }
+
         (void) fletcher_4_incremental_native(&dsp->dsa_drr->
             drr_u.drr_checksum.drr_checksum,
             sizeof (zio_cksum_t), &dsp->dsa_zc);
+        }
+
         if (dump_bytes(dsp, dsp->dsa_drr, sizeof (dmu_replay_record_t)) != 0)
                 return (SET_ERROR(EINTR));
         if (payload_len != 0) {
-                (void) fletcher_4_incremental_native(payload, payload_len,
-                    &dsp->dsa_zc);
-                if (dump_bytes(dsp, payload, payload_len) != 0)
+                if (dump_bytes_with_checksum(dsp, payload, payload_len) != 0)
                         return (SET_ERROR(EINTR));
         }
         return (0);
 }
 

@@ -359,13 +395,18 @@
                 return (EINTR);
         return (0);
 }
 
 static int
-dump_spill(dmu_sendarg_t *dsp, uint64_t object, int blksz, void *data)
+dump_spill(dmu_sendarg_t *dsp, uint64_t object,
+    const blkptr_t *bp, const zbookmark_phys_t *zb)
 {
+        int rc = 0;
         struct drr_spill *drrs = &(dsp->dsa_drr->drr_u.drr_spill);
+        enum arc_flags aflags = ARC_FLAG_WAIT;
+        int blksz = BP_GET_LSIZE(bp);
+        arc_buf_t *abuf;
 
         if (dsp->dsa_pending_op != PENDING_NONE) {
                 if (dump_record(dsp, NULL, 0) != 0)
                         return (SET_ERROR(EINTR));
                 dsp->dsa_pending_op = PENDING_NONE;

@@ -376,12 +417,42 @@
         dsp->dsa_drr->drr_type = DRR_SPILL;
         drrs->drr_object = object;
         drrs->drr_length = blksz;
         drrs->drr_toguid = dsp->dsa_toguid;
 
-        if (dump_record(dsp, data, blksz) != 0)
+        if (dump_record(dsp, NULL, 0))
                 return (SET_ERROR(EINTR));
+
+        /*
+         * if dsa_krrp task is not NULL, then the send is from krrp and we can
+         * try to bypass copying data to an intermediate buffer.
+         */
+        if (!dsp->sendsize && dsp->dsa_krrp_task != NULL) {
+                rc = dmu_krrp_direct_arc_read(dsp->dsa_os->os_spa,
+                    dsp->dsa_krrp_task, &dsp->dsa_zc, bp);
+                /*
+                 * rc == 0 means that we successfully copy
+                 * the data directly from ARC to krrp buffer
+                 * rc != 0 && rc != EINTR means that we cannot
+                 * zerocopy the data and need to use slow-path
+                 */
+                if (rc == 0 || rc == EINTR)
+                        return (rc);
+
+                ASSERT3U(rc, ==, ENODATA);
+        }
+
+        if (arc_read(NULL, dsp->dsa_os->os_spa, bp, arc_getbuf_func, &abuf,
+            ZIO_PRIORITY_ASYNC_READ, ZIO_FLAG_CANFAIL,
+            &aflags, zb) != 0)
+                return (SET_ERROR(EIO));
+
+        rc = dump_bytes_with_checksum(dsp, abuf->b_data, blksz);
+        arc_buf_destroy(abuf, &abuf);
+        if (rc != 0)
+                return (SET_ERROR(EINTR));
+
         return (0);
 }
 
 static int
 dump_freeobjects(dmu_sendarg_t *dsp, uint64_t firstobj, uint64_t numobjs)

@@ -634,21 +705,16 @@
                         if (err != 0)
                                 break;
                 }
                 arc_buf_destroy(abuf, &abuf);
         } else if (type == DMU_OT_SA) {
-                arc_flags_t aflags = ARC_FLAG_WAIT;
-                arc_buf_t *abuf;
-                int blksz = BP_GET_LSIZE(bp);
-
-                if (arc_read(NULL, spa, bp, arc_getbuf_func, &abuf,
-                    ZIO_PRIORITY_ASYNC_READ, ZIO_FLAG_CANFAIL,
-                    &aflags, zb) != 0)
-                        return (SET_ERROR(EIO));
-
-                err = dump_spill(dsa, zb->zb_object, blksz, abuf->b_data);
-                arc_buf_destroy(abuf, &abuf);
+                /*
+                 * The upstream code has arc_read() call here, but we moved
+                 * it to dump_spill() since we want to take advantage of
+                 * zero copy of the buffer if possible
+                 */
+                err = dump_spill(dsa, zb->zb_object, bp, zb);
         } else if (backup_do_embed(dsa, bp)) {
                 /* it's an embedded level-0 block of a regular object */
                 int blksz = dblkszsec << SPA_MINBLOCKSHIFT;
                 ASSERT0(zb->zb_level);
                 err = dump_write_embedded(dsa, zb->zb_object,

@@ -685,15 +751,10 @@
                 ASSERT0(zb->zb_level);
                 ASSERT(zb->zb_object > dsa->dsa_resume_object ||
                     (zb->zb_object == dsa->dsa_resume_object &&
                     zb->zb_blkid * blksz >= dsa->dsa_resume_offset));
 
-                ASSERT0(zb->zb_level);
-                ASSERT(zb->zb_object > dsa->dsa_resume_object ||
-                    (zb->zb_object == dsa->dsa_resume_object &&
-                    zb->zb_blkid * blksz >= dsa->dsa_resume_offset));
-
                 ASSERT3U(blksz, ==, BP_GET_LSIZE(bp));
 
                 enum zio_flag zioflags = ZIO_FLAG_CANFAIL;
                 if (request_compressed)
                         zioflags |= ZIO_FLAG_RAW;

@@ -722,11 +783,10 @@
                         while (blksz > 0 && err == 0) {
                                 int n = MIN(blksz, SPA_OLD_MAXBLOCKSIZE);
                                 err = dump_write(dsa, type, zb->zb_object,
                                     offset, n, n, NULL, buf);
                                 offset += n;
-                                buf += n;
                                 blksz -= n;
                         }
                 } else {
                         err = dump_write(dsa, type, zb->zb_object, offset,
                             blksz, arc_buf_size(abuf), bp, abuf->b_data);

@@ -753,15 +813,15 @@
  * Actually do the bulk of the work in a zfs send.
  *
  * Note: Releases dp using the specified tag.
  */
 static int
-dmu_send_impl(void *tag, dsl_pool_t *dp, dsl_dataset_t *to_ds,
+dmu_send_impl_ss(void *tag, dsl_pool_t *dp, dsl_dataset_t *to_ds,
     zfs_bookmark_phys_t *ancestor_zb, boolean_t is_clone,
     boolean_t embedok, boolean_t large_block_ok, boolean_t compressok,
-    int outfd, uint64_t resumeobj, uint64_t resumeoff,
-    vnode_t *vp, offset_t *off)
+    int outfd, uint64_t resumeobj, uint64_t resumeoff, vnode_t *vp,
+    offset_t *off, boolean_t sendsize, dmu_krrp_task_t *krrp_task)
 {
         objset_t *os;
         dmu_replay_record_t *drr;
         dmu_sendarg_t *dsp;
         int err;

@@ -848,12 +908,14 @@
         dsp->dsa_outfd = outfd;
         dsp->dsa_proc = curproc;
         dsp->dsa_os = os;
         dsp->dsa_off = off;
         dsp->dsa_toguid = dsl_dataset_phys(to_ds)->ds_guid;
+        dsp->dsa_krrp_task = krrp_task;
         dsp->dsa_pending_op = PENDING_NONE;
         dsp->dsa_featureflags = featureflags;
+        dsp->sendsize = sendsize;
         dsp->dsa_resume_object = resumeobj;
         dsp->dsa_resume_offset = resumeoff;
 
         mutex_enter(&to_ds->ds_sendstream_lock);
         list_insert_head(&to_ds->ds_sendstreams, dsp);

@@ -901,11 +963,11 @@
         to_data = bqueue_dequeue(&to_arg.q);
 
         while (!to_data->eos_marker && err == 0) {
                 err = do_dump(dsp, to_data);
                 to_data = get_next_record(&to_arg.q, to_data);
-                if (issig(JUSTLOOKING) && issig(FORREAL))
+                if (vp != NULL && issig(JUSTLOOKING) && issig(FORREAL))
                         err = EINTR;
         }
 
         if (err != 0) {
                 to_arg.cancel = B_TRUE;

@@ -955,13 +1017,25 @@
 
         return (err);
 }
 
 int
+dmu_send_impl(void *tag, dsl_pool_t *dp, dsl_dataset_t *to_ds,
+    zfs_bookmark_phys_t *ancestor_zb, boolean_t is_clone, boolean_t embedok,
+    boolean_t large_block_ok, boolean_t compressok, int outfd,
+    uint64_t resumeobj, uint64_t resumeoff, vnode_t *vp, offset_t *off,
+    dmu_krrp_task_t *krrp_task)
+{
+        return (dmu_send_impl_ss(tag, dp, to_ds, ancestor_zb, is_clone,
+            embedok, large_block_ok, compressok, outfd, resumeobj, resumeoff,
+            vp, off, B_FALSE, krrp_task));
+}
+
+int
 dmu_send_obj(const char *pool, uint64_t tosnap, uint64_t fromsnap,
     boolean_t embedok, boolean_t large_block_ok, boolean_t compressok,
-    int outfd, vnode_t *vp, offset_t *off)
+    int outfd, vnode_t *vp, offset_t *off, boolean_t sendsize)
 {
         dsl_pool_t *dp;
         dsl_dataset_t *ds;
         dsl_dataset_t *fromds = NULL;
         int err;

@@ -992,15 +1066,17 @@
                     dsl_dataset_phys(fromds)->ds_creation_time;
                 zb.zbm_creation_txg = dsl_dataset_phys(fromds)->ds_creation_txg;
                 zb.zbm_guid = dsl_dataset_phys(fromds)->ds_guid;
                 is_clone = (fromds->ds_dir != ds->ds_dir);
                 dsl_dataset_rele(fromds, FTAG);
-                err = dmu_send_impl(FTAG, dp, ds, &zb, is_clone,
-                    embedok, large_block_ok, compressok, outfd, 0, 0, vp, off);
+                err = dmu_send_impl_ss(FTAG, dp, ds, &zb, is_clone,
+                    embedok, large_block_ok, compressok, outfd, 0, 0, vp, off,
+                        sendsize, NULL);
         } else {
-                err = dmu_send_impl(FTAG, dp, ds, NULL, B_FALSE,
-                    embedok, large_block_ok, compressok, outfd, 0, 0, vp, off);
+                err = dmu_send_impl_ss(FTAG, dp, ds, NULL, B_FALSE,
+                    embedok, large_block_ok, compressok, outfd, 0, 0, vp, off,
+                        sendsize, NULL);
         }
         dsl_dataset_rele(ds, FTAG);
         return (err);
 }
 

@@ -1073,16 +1149,16 @@
                         dsl_dataset_rele(ds, FTAG);
                         dsl_pool_rele(dp, FTAG);
                         return (err);
                 }
                 err = dmu_send_impl(FTAG, dp, ds, &zb, is_clone,
-                    embedok, large_block_ok, compressok,
-                    outfd, resumeobj, resumeoff, vp, off);
+                    embedok, large_block_ok, compressok, outfd,
+                    resumeobj, resumeoff, vp, off, NULL);
         } else {
                 err = dmu_send_impl(FTAG, dp, ds, NULL, B_FALSE,
-                    embedok, large_block_ok, compressok,
-                    outfd, resumeobj, resumeoff, vp, off);
+                    embedok, large_block_ok, compressok, outfd,
+                    resumeobj, resumeoff, vp, off, NULL);
         }
         if (owned)
                 dsl_dataset_disown(ds, FTAG);
         else
                 dsl_dataset_rele(ds, FTAG);

@@ -1255,29 +1331,54 @@
         uint64_t drba_snapobj;
 } dmu_recv_begin_arg_t;
 
 static int
 recv_begin_check_existing_impl(dmu_recv_begin_arg_t *drba, dsl_dataset_t *ds,
-    uint64_t fromguid)
+    uint64_t fromguid, dmu_tx_t *tx)
 {
         uint64_t val;
         int error;
         dsl_pool_t *dp = ds->ds_dir->dd_pool;
 
+        if (dmu_tx_is_syncing(tx)) {
         /* temporary clone name must not exist */
         error = zap_lookup(dp->dp_meta_objset,
-            dsl_dir_phys(ds->ds_dir)->dd_child_dir_zapobj, recv_clone_name,
-            8, 1, &val);
-        if (error != ENOENT)
-                return (error == 0 ? EBUSY : error);
+                    dsl_dir_phys(ds->ds_dir)->dd_child_dir_zapobj,
+                    recv_clone_name, 8, 1, &val);
+                if (error == 0) {
+                        dsl_dataset_t *tds;
 
+                        /* check that if it is currently used */
+                        error = dsl_dataset_own_obj(dp, val, FTAG, &tds);
+                        if (!error) {
+                                char name[ZFS_MAX_DATASET_NAME_LEN];
+
+                                dsl_dataset_name(tds, name);
+                                dsl_dataset_disown(tds, FTAG);
+
+                                error = dsl_dataset_hold(dp, name, FTAG, &tds);
+                                if (!error) {
+                                        dsl_destroy_head_sync_impl(tds, tx);
+                                        dsl_dataset_rele(tds, FTAG);
+                                        error = ENOENT;
+                                }
+                        } else {
+                                error = 0;
+                        }
+                }
+                if (error != ENOENT) {
+                        return (error == 0 ?
+                            SET_ERROR(EBUSY) : SET_ERROR(error));
+                }
+        }
+
         /* new snapshot name must not exist */
         error = zap_lookup(dp->dp_meta_objset,
             dsl_dataset_phys(ds)->ds_snapnames_zapobj,
             drba->drba_cookie->drc_tosnap, 8, 1, &val);
         if (error != ENOENT)
-                return (error == 0 ? EEXIST : error);
+                return (error == 0 ? SET_ERROR(EEXIST) : SET_ERROR(error));
 
         /*
          * Check snapshot limit before receiving. We'll recheck again at the
          * end, but might as well abort before receiving if we're already over
          * the limit.

@@ -1397,17 +1498,33 @@
 
         error = dsl_dataset_hold(dp, tofs, FTAG, &ds);
         if (error == 0) {
                 /* target fs already exists; recv into temp clone */
 
+                if (spa_feature_is_active(dp->dp_spa, SPA_FEATURE_WBC)) {
+                        objset_t *os = NULL;
+
+                        error = dmu_objset_from_ds(ds, &os);
+                        if (error) {
+                                dsl_dataset_rele(ds, FTAG);
+                                return (error);
+                        }
+
+                        /* Recv is impossible into DS that uses WBC */
+                        if (os->os_wbc_mode != ZFS_WBC_MODE_OFF) {
+                                dsl_dataset_rele(ds, FTAG);
+                                return (SET_ERROR(EKZFS_WBCNOTSUP));
+                        }
+                }
+
                 /* Can't recv a clone into an existing fs */
                 if (flags & DRR_FLAG_CLONE || drba->drba_origin) {
                         dsl_dataset_rele(ds, FTAG);
                         return (SET_ERROR(EINVAL));
                 }
 
-                error = recv_begin_check_existing_impl(drba, ds, fromguid);
+                error = recv_begin_check_existing_impl(drba, ds, fromguid, tx);
                 dsl_dataset_rele(ds, FTAG);
         } else if (error == ENOENT) {
                 /* target fs does not exist; must be a full backup or clone */
                 char buf[ZFS_MAX_DATASET_NAME_LEN];
 

@@ -1433,10 +1550,26 @@
                 (void) strlcpy(buf, tofs, strrchr(tofs, '/') - tofs + 1);
                 error = dsl_dataset_hold(dp, buf, FTAG, &ds);
                 if (error != 0)
                         return (error);
 
+                if (spa_feature_is_active(dp->dp_spa, SPA_FEATURE_WBC)) {
+                        objset_t *os = NULL;
+
+                        error = dmu_objset_from_ds(ds, &os);
+                        if (error) {
+                                dsl_dataset_rele(ds, FTAG);
+                                return (error);
+                        }
+
+                        /* Recv is impossible into DS that uses WBC */
+                        if (os->os_wbc_mode != ZFS_WBC_MODE_OFF) {
+                                dsl_dataset_rele(ds, FTAG);
+                                return (SET_ERROR(EKZFS_WBCNOTSUP));
+                        }
+                }
+
                 /*
                  * Check filesystem and snapshot limits before receiving. We'll
                  * recheck snapshot limits again at the end (we create the
                  * filesystems and increment those counts during begin_sync).
                  */

@@ -1647,11 +1780,11 @@
         /* check that there is resuming data, and that the toguid matches */
         if (!dsl_dataset_is_zapified(ds)) {
                 dsl_dataset_rele(ds, FTAG);
                 return (SET_ERROR(EINVAL));
         }
-        uint64_t val;
+        uint64_t val = 0;
         error = zap_lookup(dp->dp_meta_objset, ds->ds_object,
             DS_FIELD_RESUME_TOGUID, sizeof (val), 1, &val);
         if (error != 0 || drrb->drr_toguid != val) {
                 dsl_dataset_rele(ds, FTAG);
                 return (SET_ERROR(EINVAL));

@@ -1736,11 +1869,12 @@
  * NB: callers *MUST* call dmu_recv_stream() if dmu_recv_begin()
  * succeeds; otherwise we will leak the holds on the datasets.
  */
 int
 dmu_recv_begin(char *tofs, char *tosnap, dmu_replay_record_t *drr_begin,
-    boolean_t force, boolean_t resumable, char *origin, dmu_recv_cookie_t *drc)
+    boolean_t force, boolean_t resumable, boolean_t force_cksum,
+    char *origin, dmu_recv_cookie_t *drc)
 {
         dmu_recv_begin_arg_t drba = { 0 };
 
         bzero(drc, sizeof (dmu_recv_cookie_t));
         drc->drc_drr_begin = drr_begin;

@@ -1751,16 +1885,23 @@
         drc->drc_resumable = resumable;
         drc->drc_cred = CRED();
 
         if (drc->drc_drrb->drr_magic == BSWAP_64(DMU_BACKUP_MAGIC)) {
                 drc->drc_byteswap = B_TRUE;
+
+                /* on-wire checksum can be disabled for krrp */
+                if (force_cksum) {
                 (void) fletcher_4_incremental_byteswap(drr_begin,
                     sizeof (dmu_replay_record_t), &drc->drc_cksum);
                 byteswap_record(drr_begin);
+                }
         } else if (drc->drc_drrb->drr_magic == DMU_BACKUP_MAGIC) {
+                /* on-wire checksum can be disabled for krrp */
+                if (force_cksum) {
                 (void) fletcher_4_incremental_native(drr_begin,
                     sizeof (dmu_replay_record_t), &drc->drc_cksum);
+                }
         } else {
                 return (SET_ERROR(EINVAL));
         }
 
         drba.drba_origin = origin;

@@ -1840,10 +1981,11 @@
         struct receive_record_arg *rrd;
         /* A record that has had its header read in, but not its payload. */
         struct receive_record_arg *next_rrd;
         zio_cksum_t cksum;
         zio_cksum_t prev_cksum;
+        dmu_krrp_task_t *krrp_task;
         int err;
         boolean_t byteswap;
         /* Sorted list of objects not to issue prefetches for. */
         struct objlist ignore_objlist;
 };

@@ -1892,31 +2034,43 @@
          * The code doesn't rely on this (lengths being multiples of 8).  See
          * comment in dump_bytes.
          */
         ASSERT0(len % 8);
 
+        /*
+         * if vp is NULL, then the send is from krrp and we can try to bypass
+         * copying data to an intermediate buffer.
+         */
+        if (ra->vp != NULL) {
         while (done < len) {
-                ssize_t resid;
+                        ssize_t resid = 0;
 
                 ra->err = vn_rdwr(UIO_READ, ra->vp,
                     (char *)buf + done, len - done,
                     ra->voff, UIO_SYSSPACE, FAPPEND,
                     RLIM64_INFINITY, CRED(), &resid);
-
                 if (resid == len - done) {
                         /*
-                         * Note: ECKSUM indicates that the receive
-                         * was interrupted and can potentially be resumed.
+                                 * Note: ECKSUM indicates that the receive was
+                                 * interrupted and can potentially be resumed.
                          */
                         ra->err = SET_ERROR(ECKSUM);
                 }
                 ra->voff += len - done - resid;
                 done = len - resid;
                 if (ra->err != 0)
                         return (ra->err);
         }
+        } else {
+                ASSERT(ra->krrp_task != NULL);
+                ra->err = dmu_krrp_buffer_read(buf, len, ra->krrp_task);
+                if (ra->err != 0)
+                        return (ra->err);
 
+                done = len;
+        }
+
         ra->bytes_read += len;
 
         ASSERT3U(done, ==, len);
         return (0);
 }

@@ -2216,10 +2370,11 @@
         err = dmu_tx_assign(tx, TXG_WAIT);
         if (err != 0) {
                 dmu_tx_abort(tx);
                 return (err);
         }
+
         if (rwa->byteswap) {
                 dmu_object_byteswap_t byteswap =
                     DMU_OT_BYTESWAP(drrw->drr_type);
                 dmu_ot_byteswap[byteswap].ob_func(abuf->b_data,
                     DRR_WRITE_PAYLOAD_SIZE(drrw));

@@ -2445,10 +2600,12 @@
  */
 static int
 receive_read_payload_and_next_header(struct receive_arg *ra, int len, void *buf)
 {
         int err;
+        boolean_t checksum_enable = (ra->krrp_task == NULL ||
+            ra->krrp_task->buffer_args.force_cksum);
 
         if (len != 0) {
                 ASSERT3U(len, <=, SPA_MAXBLOCKSIZE);
                 err = receive_read(ra, len, buf);
                 if (err != 0)

@@ -2478,18 +2635,21 @@
                 kmem_free(ra->next_rrd, sizeof (*ra->next_rrd));
                 ra->next_rrd = NULL;
                 return (SET_ERROR(EINVAL));
         }
 
+        if (checksum_enable) {
         /*
          * Note: checksum is of everything up to but not including the
          * checksum itself.
          */
-        ASSERT3U(offsetof(dmu_replay_record_t, drr_u.drr_checksum.drr_checksum),
+                ASSERT3U(offsetof(dmu_replay_record_t,
+                    drr_u.drr_checksum.drr_checksum),
             ==, sizeof (dmu_replay_record_t) - sizeof (zio_cksum_t));
         receive_cksum(ra,
-            offsetof(dmu_replay_record_t, drr_u.drr_checksum.drr_checksum),
+                    offsetof(dmu_replay_record_t,
+                    drr_u.drr_checksum.drr_checksum),
             &ra->next_rrd->header);
 
         zio_cksum_t cksum_orig =
             ra->next_rrd->header.drr_u.drr_checksum.drr_checksum;
         zio_cksum_t *cksump =

@@ -2504,10 +2664,11 @@
                 ra->next_rrd = NULL;
                 return (SET_ERROR(ECKSUM));
         }
 
         receive_cksum(ra, sizeof (cksum_orig), &cksum_orig);
+        }
 
         return (0);
 }
 
 static void

@@ -2700,13 +2861,17 @@
                 err = receive_read_payload_and_next_header(ra, 0, NULL);
                 return (err);
         }
         case DRR_END:
         {
+                if (ra->krrp_task == NULL ||
+                    ra->krrp_task->buffer_args.force_cksum) {
                 struct drr_end *drre = &ra->rrd->header.drr_u.drr_end;
-                if (!ZIO_CHECKSUM_EQUAL(ra->prev_cksum, drre->drr_checksum))
+                        if (!ZIO_CHECKSUM_EQUAL(ra->prev_cksum,
+                            drre->drr_checksum))
                         return (SET_ERROR(ECKSUM));
+                }
                 return (0);
         }
         case DRR_SPILL:
         {
                 struct drr_spill *drrs = &ra->rrd->header.drr_u.drr_spill;

@@ -2868,11 +3033,11 @@
  *
  * NB: callers *must* call dmu_recv_end() if this succeeds.
  */
 int
 dmu_recv_stream(dmu_recv_cookie_t *drc, vnode_t *vp, offset_t *voffp,
-    int cleanup_fd, uint64_t *action_handlep)
+    int cleanup_fd, uint64_t *action_handlep, dmu_krrp_task_t *krrp_task)
 {
         int err = 0;
         struct receive_arg ra = { 0 };
         struct receive_writer_arg rwa = { 0 };
         int featureflags;

@@ -2880,10 +3045,11 @@
 
         ra.byteswap = drc->drc_byteswap;
         ra.cksum = drc->drc_cksum;
         ra.vp = vp;
         ra.voff = *voffp;
+        ra.krrp_task = krrp_task;
 
         if (dsl_dataset_is_zapified(drc->drc_ds)) {
                 (void) zap_lookup(drc->drc_ds->ds_dir->dd_pool->dp_meta_objset,
                     drc->drc_ds->ds_object, DS_FIELD_RESUME_BYTES,
                     sizeof (ra.bytes_read), 1, &ra.bytes_read);

@@ -2988,11 +3154,11 @@
          * has been handed off to the writer thread who will free it.  Finally,
          * if receive_read_record fails or we're at the end of the stream, then
          * we free ra.rrd and exit.
          */
         while (rwa.err == 0) {
-                if (issig(JUSTLOOKING) && issig(FORREAL)) {
+                if (vp && issig(JUSTLOOKING) && issig(FORREAL)) {
                         err = SET_ERROR(EINTR);
                         break;
                 }
 
                 ASSERT3P(ra.rrd, ==, NULL);

@@ -3054,10 +3220,22 @@
         dsl_pool_t *dp = dmu_tx_pool(tx);
         int error;
 
         ASSERT3P(drc->drc_ds->ds_owner, ==, dmu_recv_tag);
 
+        if (spa_feature_is_active(dp->dp_spa, SPA_FEATURE_WBC)) {
+                objset_t *os = NULL;
+
+                error  = dmu_objset_from_ds(drc->drc_ds, &os);
+                if (error)
+                        return (error);
+
+                /* Recv is impossible into DS that uses WBC */
+                if (os->os_wbc_mode != ZFS_WBC_MODE_OFF)
+                        return (SET_ERROR(EKZFS_WBCNOTSUP));
+        }
+
         if (!drc->drc_newfs) {
                 dsl_dataset_t *origin_head;
 
                 error = dsl_dataset_hold(dp, drc->drc_tofs, FTAG, &origin_head);
                 if (error != 0)

@@ -3110,10 +3288,23 @@
                 error = dsl_destroy_head_check_impl(drc->drc_ds, 1);
         } else {
                 error = dsl_dataset_snapshot_check_impl(drc->drc_ds,
                     drc->drc_tosnap, tx, B_TRUE, 1, drc->drc_cred);
         }
+
+        if (dmu_tx_is_syncing(tx) && drc->drc_krrp_task != NULL) {
+                const char *token =
+                    drc->drc_krrp_task->buffer_args.to_ds;
+                const char *cookie = drc->drc_krrp_task->cookie;
+                dsl_pool_t *dp = tx->tx_pool;
+
+                if (*token != '\0') {
+                        error = zap_update(dp->dp_meta_objset,
+                            DMU_POOL_DIRECTORY_OBJECT, token, 1,
+                            strlen(cookie) + 1, cookie, tx);
+                }
+        }
         return (error);
 }
 
 static void
 dmu_recv_end_sync(void *arg, dmu_tx_t *tx)