Print this page
NEX-9752 backport illumos 6950 ARC should cache compressed data
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
6950 ARC should cache compressed data
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Don Brady <don.brady@intel.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
NEX-8521 zdb -h <pool> raises core dump
Reviewed by: Alex Deiter <alex.deiter@nexenta.com>
Reviewed by: Dan Fields <dan.fields@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
SUP-918: zdb -h infinite loop when buffering records larger than static limit
Reviewed by: Rob Gittins <rob.gittins@nexenta.com>
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
NEX-3650 KRRP needs to clean up cstyle, hdrchk, and mapfile issues
Reviewed by: Jean McCormack <jean.mccormack@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
NEX-3214 remove cos object type from dmu.h
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
6391 Override default SPA config location via environment
Reviewed by: Brian Behlendorf <behlendorf1@llnl.gov>
Reviewed by: Richard Yao <ryao@gentoo.org>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Will Andrews <will@freebsd.org>
Reviewed by: George Wilson <george.wilson@delphix.com>
Approved by: Robert Mustacchi <rm@joyent.com>
6268 zfs diff confused by moving a file to another directory
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: Justin Gibbs <gibbs@scsiguy.com>
Approved by: Dan McDonald <danmcd@omniti.com>
6290 zdb -h overflows stack
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Brian Donohue <brian.donohue@delphix.com>
Reviewed by: Xin Li <delphij@freebsd.org>
Reviewed by: Don Brady <dev.fs.zfs@gmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>
6047 SPARC boot should support feature@embedded_data
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>
5959 clean up per-dataset feature count code
Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
NEX-4582 update wrc test cases for allow to use write back cache per tree of datasets
Reviewed by: Steve Peng <steve.peng@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
5960 zfs recv should prefetch indirect blocks
5925 zfs receive -o origin=
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
5812 assertion failed in zrl_tryenter(): zr_owner==NULL
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: Will Andrews <will@freebsd.org>
Approved by: Gordon Ross <gwr@nexenta.com>
5810 zdb should print details of bpobj
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Will Andrews <will@freebsd.org>
Reviewed by: Simon Klinkert <simon.klinkert@gmail.com>
Approved by: Gordon Ross <gwr@nexenta.com>
NEX-3558 KRRP Integration
NEX-3212 remove vdev prop object type from dmu.h
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Josef Sipek <josef.sipek@nexenta.com>
4370 avoid transmitting holes during zfs send
4371 DMU code clean up
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net>
Approved by: Garrett D'Amore <garrett@damore.org>
Make special vdev subtree topology the same as regular vdev subtree to simplify testcase setup
Fixup merge issues
Issue #40: ZDB shouldn't crash with new code
re #12611 rb4105 zpool import panic in ddt_zap_count()
re #8279 rb3915 need a mechanism to notify NMS about ZFS config changes (fix lint -courtesy of Yuri Pankov)
re #12584 rb4049 zfsxx latest code merge (fix lint - courtesy of Yuri Pankov)
re #12585 rb4049 ZFS++ work port - refactoring to improve separation of open/closed code, bug fixes, performance improvements - open code
Bug 11205: add missing libzfs_closed_stubs.c to fix opensource-only build.
ZFS plus work: special vdevs, cos, cos/vdev properties


   4  * The contents of this file are subject to the terms of the
   5  * Common Development and Distribution License (the "License").
   6  * You may not use this file except in compliance with the License.
   7  *
   8  * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
   9  * or http://www.opensolaris.org/os/licensing.
  10  * See the License for the specific language governing permissions
  11  * and limitations under the License.
  12  *
  13  * When distributing Covered Code, include this CDDL HEADER in each
  14  * file and include the License file at usr/src/OPENSOLARIS.LICENSE.
  15  * If applicable, add the following below this CDDL HEADER, with the
  16  * fields enclosed by brackets "[]" replaced with your own identifying
  17  * information: Portions Copyright [yyyy] [name of copyright owner]
  18  *
  19  * CDDL HEADER END
  20  */
  21 
  22 /*
  23  * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
  24  * Copyright (c) 2011, 2017 by Delphix. All rights reserved.
  25  * Copyright (c) 2014 Integros [integros.com]
  26  * Copyright 2017 Nexenta Systems, Inc.
  27  * Copyright 2017 RackTop Systems.
  28  */
  29 
  30 #include <stdio.h>
  31 #include <unistd.h>
  32 #include <stdio_ext.h>
  33 #include <stdlib.h>
  34 #include <ctype.h>


  35 #include <sys/zfs_context.h>
  36 #include <sys/spa.h>
  37 #include <sys/spa_impl.h>
  38 #include <sys/dmu.h>
  39 #include <sys/zap.h>
  40 #include <sys/fs/zfs.h>
  41 #include <sys/zfs_znode.h>
  42 #include <sys/zfs_sa.h>
  43 #include <sys/sa.h>
  44 #include <sys/sa_impl.h>
  45 #include <sys/vdev.h>
  46 #include <sys/vdev_impl.h>
  47 #include <sys/metaslab_impl.h>
  48 #include <sys/dmu_objset.h>
  49 #include <sys/dsl_dir.h>
  50 #include <sys/dsl_dataset.h>
  51 #include <sys/dsl_pool.h>
  52 #include <sys/dbuf.h>
  53 #include <sys/zil.h>
  54 #include <sys/zil_impl.h>


  61 #include <sys/arc.h>
  62 #include <sys/ddt.h>
  63 #include <sys/zfeature.h>
  64 #include <sys/abd.h>
  65 #include <sys/blkptr.h>
  66 #include <zfs_comutil.h>
  67 #include <libcmdutils.h>
  68 #undef verify
  69 #include <libzfs.h>
  70 
  71 #include "zdb.h"
  72 
  73 #define ZDB_COMPRESS_NAME(idx) ((idx) < ZIO_COMPRESS_FUNCTIONS ?     \
  74         zio_compress_table[(idx)].ci_name : "UNKNOWN")
  75 #define ZDB_CHECKSUM_NAME(idx) ((idx) < ZIO_CHECKSUM_FUNCTIONS ?     \
  76         zio_checksum_table[(idx)].ci_name : "UNKNOWN")
  77 #define ZDB_OT_NAME(idx) ((idx) < DMU_OT_NUMTYPES ?  \
  78         dmu_ot[(idx)].ot_name : DMU_OT_IS_VALID(idx) ?  \
  79         dmu_ot_byteswap[DMU_OT_BYTESWAP(idx)].ob_name : "UNKNOWN")
  80 #define ZDB_OT_TYPE(idx) ((idx) < DMU_OT_NUMTYPES ? (idx) :          \
  81         (idx) == DMU_OTN_ZAP_DATA || (idx) == DMU_OTN_ZAP_METADATA ?    \
  82         DMU_OT_ZAP_OTHER : \
  83         (idx) == DMU_OTN_UINT64_DATA || (idx) == DMU_OTN_UINT64_METADATA ? \
  84         DMU_OT_UINT64_OTHER : DMU_OT_NUMTYPES)
  85 
  86 #ifndef lint
  87 extern int reference_tracking_enable;
  88 extern boolean_t zfs_recover;
  89 extern uint64_t zfs_arc_max, zfs_arc_meta_limit;
  90 extern int zfs_vdev_async_read_max_active;
  91 extern int aok;
  92 extern boolean_t spa_load_verify_dryrun;
  93 #else
  94 int reference_tracking_enable;
  95 boolean_t zfs_recover;
  96 uint64_t zfs_arc_max, zfs_arc_meta_limit;
  97 int zfs_vdev_async_read_max_active;
  98 int aok;
  99 boolean_t spa_load_verify_dryrun;
 100 #endif
 101 
 102 static const char cmdname[] = "zdb";
 103 uint8_t dump_opt[256];
 104 
 105 typedef void object_viewer_t(objset_t *, uint64_t, void *data, size_t size);
 106 
 107 uint64_t *zopt_object = NULL;
 108 static unsigned zopt_objects = 0;
 109 libzfs_handle_t *g_zfs;
 110 uint64_t max_inflight = 1000;
 111 
 112 static void snprintf_blkptr_compact(char *, size_t, const blkptr_t *);
 113 
 114 /*
 115  * These libumem hooks provide a reasonable set of defaults for the allocator's
 116  * debugging facilities.
 117  */
 118 const char *
 119 _umem_debug_init()


 657 
 658         if (vd->vdev_ops->vdev_op_leaf) {
 659                 space_map_t *sm = vd->vdev_dtl_sm;
 660 
 661                 if (sm != NULL &&
 662                     sm->sm_dbuf->db_size == sizeof (space_map_phys_t))
 663                         return (1);
 664                 return (0);
 665         }
 666 
 667         for (unsigned c = 0; c < vd->vdev_children; c++)
 668                 refcount += get_dtl_refcount(vd->vdev_child[c]);
 669         return (refcount);
 670 }
 671 
 672 static int
 673 get_metaslab_refcount(vdev_t *vd)
 674 {
 675         int refcount = 0;
 676 
 677         if (vd->vdev_top == vd) {
 678                 for (uint64_t m = 0; m < vd->vdev_ms_count; m++) {
 679                         space_map_t *sm = vd->vdev_ms[m]->ms_sm;
 680 
 681                         if (sm != NULL &&
 682                             sm->sm_dbuf->db_size == sizeof (space_map_phys_t))
 683                                 refcount++;
 684                 }
 685         }
 686         for (unsigned c = 0; c < vd->vdev_children; c++)
 687                 refcount += get_metaslab_refcount(vd->vdev_child[c]);
 688 
 689         return (refcount);
 690 }
 691 
 692 static int
 693 get_obsolete_refcount(vdev_t *vd)
 694 {
 695         int refcount = 0;
 696 
 697         uint64_t obsolete_sm_obj = vdev_obsolete_sm_object(vd);
 698         if (vd->vdev_top == vd && obsolete_sm_obj != 0) {
 699                 dmu_object_info_t doi;
 700                 VERIFY0(dmu_object_info(vd->vdev_spa->spa_meta_objset,
 701                     obsolete_sm_obj, &doi));
 702                 if (doi.doi_bonus_size == sizeof (space_map_phys_t)) {
 703                         refcount++;
 704                 }
 705         } else {
 706                 ASSERT3P(vd->vdev_obsolete_sm, ==, NULL);
 707                 ASSERT3U(obsolete_sm_obj, ==, 0);
 708         }
 709         for (unsigned c = 0; c < vd->vdev_children; c++) {
 710                 refcount += get_obsolete_refcount(vd->vdev_child[c]);
 711         }
 712 
 713         return (refcount);
 714 }
 715 
 716 static int
 717 get_prev_obsolete_spacemap_refcount(spa_t *spa)
 718 {
 719         uint64_t prev_obj =
 720             spa->spa_condensing_indirect_phys.scip_prev_obsolete_sm_object;
 721         if (prev_obj != 0) {
 722                 dmu_object_info_t doi;
 723                 VERIFY0(dmu_object_info(spa->spa_meta_objset, prev_obj, &doi));
 724                 if (doi.doi_bonus_size == sizeof (space_map_phys_t)) {
 725                         return (1);
 726                 }
 727         }
 728         return (0);
 729 }
 730 
 731 static int
 732 verify_spacemap_refcounts(spa_t *spa)
 733 {
 734         uint64_t expected_refcount = 0;
 735         uint64_t actual_refcount;
 736 
 737         (void) feature_get_refcount(spa,
 738             &spa_feature_table[SPA_FEATURE_SPACEMAP_HISTOGRAM],
 739             &expected_refcount);
 740         actual_refcount = get_dtl_refcount(spa->spa_root_vdev);
 741         actual_refcount += get_metaslab_refcount(spa->spa_root_vdev);
 742         actual_refcount += get_obsolete_refcount(spa->spa_root_vdev);
 743         actual_refcount += get_prev_obsolete_spacemap_refcount(spa);
 744 
 745         if (expected_refcount != actual_refcount) {
 746                 (void) printf("space map refcount mismatch: expected %lld != "
 747                     "actual %lld\n",
 748                     (longlong_t)expected_refcount,
 749                     (longlong_t)actual_refcount);
 750                 return (2);
 751         }
 752         return (0);
 753 }
 754 
 755 static void
 756 dump_spacemap(objset_t *os, space_map_t *sm)
 757 {
 758         uint64_t alloc, offset, entry;
 759         char *ddata[] = { "ALLOC", "FREE", "CONDENSE", "INVALID",
 760             "INVALID", "INVALID", "INVALID", "INVALID" };
 761 
 762         if (sm == NULL)
 763                 return;
 764 
 765         (void) printf("space map object %llu:\n",
 766             (longlong_t)sm->sm_phys->smp_object);
 767         (void) printf("  smp_objsize = 0x%llx\n",
 768             (longlong_t)sm->sm_phys->smp_objsize);
 769         (void) printf("  smp_alloc = 0x%llx\n",
 770             (longlong_t)sm->sm_phys->smp_alloc);
 771 
 772         /*
 773          * Print out the freelist entries in both encoded and decoded form.
 774          */
 775         alloc = 0;
 776         for (offset = 0; offset < space_map_length(sm);
 777             offset += sizeof (entry)) {
 778                 uint8_t mapshift = sm->sm_shift;
 779 
 780                 VERIFY0(dmu_read(os, space_map_object(sm), offset,
 781                     sizeof (entry), &entry, DMU_READ_PREFETCH));
 782                 if (SM_DEBUG_DECODE(entry)) {
 783 
 784                         (void) printf("\t    [%6llu] %s: txg %llu, pass %llu\n",
 785                             (u_longlong_t)(offset / sizeof (entry)),
 786                             ddata[SM_DEBUG_ACTION_DECODE(entry)],
 787                             (u_longlong_t)SM_DEBUG_TXG_DECODE(entry),
 788                             (u_longlong_t)SM_DEBUG_SYNCPASS_DECODE(entry));
 789                 } else {
 790                         (void) printf("\t    [%6llu]    %c  range:"
 791                             " %010llx-%010llx  size: %06llx\n",


 856                 dump_metaslab_stats(msp);
 857                 metaslab_unload(msp);
 858                 mutex_exit(&msp->ms_lock);
 859         }
 860 
 861         if (dump_opt['m'] > 1 && sm != NULL &&
 862             spa_feature_is_active(spa, SPA_FEATURE_SPACEMAP_HISTOGRAM)) {
 863                 /*
 864                  * The space map histogram represents free space in chunks
 865                  * of sm_shift (i.e. bucket 0 refers to 2^sm_shift).
 866                  */
 867                 (void) printf("\tOn-disk histogram:\t\tfragmentation %llu\n",
 868                     (u_longlong_t)msp->ms_fragmentation);
 869                 dump_histogram(sm->sm_phys->smp_histogram,
 870                     SPACE_MAP_HISTOGRAM_SIZE, sm->sm_shift);
 871         }
 872 
 873         if (dump_opt['d'] > 5 || dump_opt['m'] > 3) {
 874                 ASSERT(msp->ms_size == (1ULL << vd->vdev_ms_shift));
 875 

 876                 dump_spacemap(spa->spa_meta_objset, msp->ms_sm);

 877         }
 878 }
 879 
 880 static void
 881 print_vdev_metaslab_header(vdev_t *vd)
 882 {
 883         (void) printf("\tvdev %10llu\n\t%-10s%5llu   %-19s   %-15s   %-10s\n",
 884             (u_longlong_t)vd->vdev_id,
 885             "metaslabs", (u_longlong_t)vd->vdev_ms_count,
 886             "offset", "spacemap", "free");
 887         (void) printf("\t%15s   %19s   %15s   %10s\n",
 888             "---------------", "-------------------",
 889             "---------------", "-------------");
 890 }
 891 
 892 static void
 893 dump_metaslab_groups(spa_t *spa)
 894 {
 895         vdev_t *rvd = spa->spa_root_vdev;
 896         metaslab_class_t *mc = spa_normal_class(spa);


 914                     (u_longlong_t)tvd->vdev_ms_count);
 915                 if (mg->mg_fragmentation == ZFS_FRAG_INVALID) {
 916                         (void) printf("%3s\n", "-");
 917                 } else {
 918                         (void) printf("%3llu%%\n",
 919                             (u_longlong_t)mg->mg_fragmentation);
 920                 }
 921                 dump_histogram(mg->mg_histogram, RANGE_TREE_HISTOGRAM_SIZE, 0);
 922         }
 923 
 924         (void) printf("\tpool %s\tfragmentation", spa_name(spa));
 925         fragmentation = metaslab_class_fragmentation(mc);
 926         if (fragmentation == ZFS_FRAG_INVALID)
 927                 (void) printf("\t%3s\n", "-");
 928         else
 929                 (void) printf("\t%3llu%%\n", (u_longlong_t)fragmentation);
 930         dump_histogram(mc->mc_histogram, RANGE_TREE_HISTOGRAM_SIZE, 0);
 931 }
 932 
 933 static void
 934 print_vdev_indirect(vdev_t *vd)
 935 {
 936         vdev_indirect_config_t *vic = &vd->vdev_indirect_config;
 937         vdev_indirect_mapping_t *vim = vd->vdev_indirect_mapping;
 938         vdev_indirect_births_t *vib = vd->vdev_indirect_births;
 939 
 940         if (vim == NULL) {
 941                 ASSERT3P(vib, ==, NULL);
 942                 return;
 943         }
 944 
 945         ASSERT3U(vdev_indirect_mapping_object(vim), ==,
 946             vic->vic_mapping_object);
 947         ASSERT3U(vdev_indirect_births_object(vib), ==,
 948             vic->vic_births_object);
 949 
 950         (void) printf("indirect births obj %llu:\n",
 951             (longlong_t)vic->vic_births_object);
 952         (void) printf("    vib_count = %llu\n",
 953             (longlong_t)vdev_indirect_births_count(vib));
 954         for (uint64_t i = 0; i < vdev_indirect_births_count(vib); i++) {
 955                 vdev_indirect_birth_entry_phys_t *cur_vibe =
 956                     &vib->vib_entries[i];
 957                 (void) printf("\toffset %llx -> txg %llu\n",
 958                     (longlong_t)cur_vibe->vibe_offset,
 959                     (longlong_t)cur_vibe->vibe_phys_birth_txg);
 960         }
 961         (void) printf("\n");
 962 
 963         (void) printf("indirect mapping obj %llu:\n",
 964             (longlong_t)vic->vic_mapping_object);
 965         (void) printf("    vim_max_offset = 0x%llx\n",
 966             (longlong_t)vdev_indirect_mapping_max_offset(vim));
 967         (void) printf("    vim_bytes_mapped = 0x%llx\n",
 968             (longlong_t)vdev_indirect_mapping_bytes_mapped(vim));
 969         (void) printf("    vim_count = %llu\n",
 970             (longlong_t)vdev_indirect_mapping_num_entries(vim));
 971 
 972         if (dump_opt['d'] <= 5 && dump_opt['m'] <= 3)
 973                 return;
 974 
 975         uint32_t *counts = vdev_indirect_mapping_load_obsolete_counts(vim);
 976 
 977         for (uint64_t i = 0; i < vdev_indirect_mapping_num_entries(vim); i++) {
 978                 vdev_indirect_mapping_entry_phys_t *vimep =
 979                     &vim->vim_entries[i];
 980                 (void) printf("\t<%llx:%llx:%llx> -> "
 981                     "<%llx:%llx:%llx> (%x obsolete)\n",
 982                     (longlong_t)vd->vdev_id,
 983                     (longlong_t)DVA_MAPPING_GET_SRC_OFFSET(vimep),
 984                     (longlong_t)DVA_GET_ASIZE(&vimep->vimep_dst),
 985                     (longlong_t)DVA_GET_VDEV(&vimep->vimep_dst),
 986                     (longlong_t)DVA_GET_OFFSET(&vimep->vimep_dst),
 987                     (longlong_t)DVA_GET_ASIZE(&vimep->vimep_dst),
 988                     counts[i]);
 989         }
 990         (void) printf("\n");
 991 
 992         uint64_t obsolete_sm_object = vdev_obsolete_sm_object(vd);
 993         if (obsolete_sm_object != 0) {
 994                 objset_t *mos = vd->vdev_spa->spa_meta_objset;
 995                 (void) printf("obsolete space map object %llu:\n",
 996                     (u_longlong_t)obsolete_sm_object);
 997                 ASSERT(vd->vdev_obsolete_sm != NULL);
 998                 ASSERT3U(space_map_object(vd->vdev_obsolete_sm), ==,
 999                     obsolete_sm_object);
1000                 dump_spacemap(mos, vd->vdev_obsolete_sm);
1001                 (void) printf("\n");
1002         }
1003 }
1004 
1005 static void
1006 dump_metaslabs(spa_t *spa)
1007 {
1008         vdev_t *vd, *rvd = spa->spa_root_vdev;
1009         uint64_t m, c = 0, children = rvd->vdev_children;
1010 
1011         (void) printf("\nMetaslabs:\n");
1012 
1013         if (!dump_opt['d'] && zopt_objects > 0) {
1014                 c = zopt_object[0];
1015 
1016                 if (c >= children)
1017                         (void) fatal("bad vdev id: %llu", (u_longlong_t)c);
1018 
1019                 if (zopt_objects > 1) {
1020                         vd = rvd->vdev_child[c];
1021                         print_vdev_metaslab_header(vd);
1022 
1023                         for (m = 1; m < zopt_objects; m++) {
1024                                 if (zopt_object[m] < vd->vdev_ms_count)
1025                                         dump_metaslab(
1026                                             vd->vdev_ms[zopt_object[m]]);
1027                                 else
1028                                         (void) fprintf(stderr, "bad metaslab "
1029                                             "number %llu\n",
1030                                             (u_longlong_t)zopt_object[m]);
1031                         }
1032                         (void) printf("\n");
1033                         return;
1034                 }
1035                 children = c + 1;
1036         }
1037         for (; c < children; c++) {
1038                 vd = rvd->vdev_child[c];
1039                 print_vdev_metaslab_header(vd);
1040 
1041                 print_vdev_indirect(vd);
1042 
1043                 for (m = 0; m < vd->vdev_ms_count; m++)
1044                         dump_metaslab(vd->vdev_ms[m]);
1045                 (void) printf("\n");
1046         }
1047 }
1048 
1049 static void
1050 dump_dde(const ddt_t *ddt, const ddt_entry_t *dde, uint64_t index)
1051 {
1052         const ddt_phys_t *ddp = dde->dde_phys;
1053         const ddt_key_t *ddk = &dde->dde_key;
1054         const char *types[4] = { "ditto", "single", "double", "triple" };
1055         char blkbuf[BP_SPRINTF_LEN];
1056         blkptr_t blk;
1057 
1058         for (int p = 0; p < DDT_PHYS_TYPES; p++, ddp++) {
1059                 if (ddp->ddp_phys_birth == 0)
1060                         continue;
1061                 ddt_bp_create(ddt->ddt_checksum, ddk, ddp, &blk);
1062                 snprintf_blkptr(blkbuf, sizeof (blkbuf), &blk);


1087             "dedup * compress / copies = %.2f\n\n",
1088             dedup, compress, copies, dedup * compress / copies);
1089 }
1090 
1091 static void
1092 dump_ddt(ddt_t *ddt, enum ddt_type type, enum ddt_class class)
1093 {
1094         char name[DDT_NAMELEN];
1095         ddt_entry_t dde;
1096         uint64_t walk = 0;
1097         dmu_object_info_t doi;
1098         uint64_t count, dspace, mspace;
1099         int error;
1100 
1101         error = ddt_object_info(ddt, type, class, &doi);
1102 
1103         if (error == ENOENT)
1104                 return;
1105         ASSERT(error == 0);
1106 
1107         if ((count = ddt_object_count(ddt, type, class)) == 0)

1108                 return;
1109 
1110         dspace = doi.doi_physical_blocks_512 << 9;
1111         mspace = doi.doi_fill_count * doi.doi_data_block_size;
1112 
1113         ddt_object_name(ddt, type, class, name);
1114 
1115         (void) printf("%s: %llu entries, size %llu on disk, %llu in core\n",
1116             name,
1117             (u_longlong_t)count,
1118             (u_longlong_t)(dspace / count),
1119             (u_longlong_t)(mspace / count));
1120 
1121         if (dump_opt['D'] < 3)
1122                 return;
1123 
1124         zpool_dump_ddt(NULL, &ddt->ddt_histogram[type][class]);
1125 
1126         if (dump_opt['D'] < 4)
1127                 return;


1198         char prefix[256];
1199 
1200         spa_vdev_state_enter(spa, SCL_NONE);
1201         required = vdev_dtl_required(vd);
1202         (void) spa_vdev_state_exit(spa, NULL, 0);
1203 
1204         if (indent == 0)
1205                 (void) printf("\nDirty time logs:\n\n");
1206 
1207         (void) printf("\t%*s%s [%s]\n", indent, "",
1208             vd->vdev_path ? vd->vdev_path :
1209             vd->vdev_parent ? vd->vdev_ops->vdev_op_type : spa_name(spa),
1210             required ? "DTL-required" : "DTL-expendable");
1211 
1212         for (int t = 0; t < DTL_TYPES; t++) {
1213                 range_tree_t *rt = vd->vdev_dtl[t];
1214                 if (range_tree_space(rt) == 0)
1215                         continue;
1216                 (void) snprintf(prefix, sizeof (prefix), "\t%*s%s",
1217                     indent + 2, "", name[t]);

1218                 range_tree_walk(rt, dump_dtl_seg, prefix);

1219                 if (dump_opt['d'] > 5 && vd->vdev_children == 0)
1220                         dump_spacemap(spa->spa_meta_objset, vd->vdev_dtl_sm);
1221         }
1222 
1223         for (unsigned c = 0; c < vd->vdev_children; c++)
1224                 dump_dtl(vd->vdev_child[c], indent + 4);
1225 }
1226 
1227 static void
1228 dump_history(spa_t *spa)
1229 {
1230         nvlist_t **events = NULL;
1231         uint64_t resid, len, off = 0;

1232         uint_t num = 0;
1233         int error;
1234         time_t tsec;
1235         struct tm t;
1236         char tbuf[30];
1237         char internalstr[MAXPATHLEN];
1238 
1239         char *buf = umem_alloc(SPA_MAXBLOCKSIZE, UMEM_NOFAIL);

1240         do {
1241                 len = SPA_MAXBLOCKSIZE;
1242 
1243                 if ((error = spa_history_get(spa, &off, &len, buf)) != 0) {
1244                         (void) fprintf(stderr, "Unable to read history: "
1245                             "error %d\n", error);
1246                         umem_free(buf, SPA_MAXBLOCKSIZE);
1247                         return;
1248                 }
1249 
1250                 if (zpool_history_unpack(buf, len, &resid, &events, &num) != 0)

1251                         break;

1252 
1253                 off -= resid;










1254         } while (len != 0);
1255         umem_free(buf, SPA_MAXBLOCKSIZE);
1256 






1257         (void) printf("\nHistory:\n");
1258         for (unsigned i = 0; i < num; i++) {
1259                 uint64_t time, txg, ievent;
1260                 char *cmd, *intstr;
1261                 boolean_t printed = B_FALSE;
1262 
1263                 if (nvlist_lookup_uint64(events[i], ZPOOL_HIST_TIME,
1264                     &time) != 0)
1265                         goto next;
1266                 if (nvlist_lookup_string(events[i], ZPOOL_HIST_CMD,
1267                     &cmd) != 0) {
1268                         if (nvlist_lookup_uint64(events[i],
1269                             ZPOOL_HIST_INT_EVENT, &ievent) != 0)
1270                                 goto next;
1271                         verify(nvlist_lookup_uint64(events[i],
1272                             ZPOOL_HIST_TXG, &txg) == 0);
1273                         verify(nvlist_lookup_string(events[i],
1274                             ZPOOL_HIST_INT_STR, &intstr) == 0);
1275                         if (ievent >= ZFS_NUM_LEGACY_HISTORY_EVENTS)
1276                                 goto next;


1278                         (void) snprintf(internalstr,
1279                             sizeof (internalstr),
1280                             "[internal %s txg:%ju] %s",
1281                             zfs_history_event_names[ievent], (uintmax_t)txg,
1282                             intstr);
1283                         cmd = internalstr;
1284                 }
1285                 tsec = time;
1286                 (void) localtime_r(&tsec, &t);
1287                 (void) strftime(tbuf, sizeof (tbuf), "%F.%T", &t);
1288                 (void) printf("%s %s\n", tbuf, cmd);
1289                 printed = B_TRUE;
1290 
1291 next:
1292                 if (dump_opt['h'] > 1) {
1293                         if (!printed)
1294                                 (void) printf("unrecognized record:\n");
1295                         dump_nvlist(events[i], 2);
1296                 }
1297         }





1298 }
1299 
1300 /*ARGSUSED*/
1301 static void
1302 dump_dnode(objset_t *os, uint64_t object, void *data, size_t size)
1303 {
1304 }
1305 
1306 static uint64_t
1307 blkid2offset(const dnode_phys_t *dnp, const blkptr_t *bp,
1308     const zbookmark_phys_t *zb)
1309 {
1310         if (dnp == NULL) {
1311                 ASSERT(zb->zb_level < 0);
1312                 if (zb->zb_object == 0)
1313                         return (zb->zb_blkid);
1314                 return (zb->zb_blkid * BP_GET_LSIZE(bp));
1315         }
1316 
1317         ASSERT(zb->zb_level >= 0);


2192 
2193         dmu_objset_name(os, osname);
2194 
2195         (void) printf("Dataset %s [%s], ID %llu, cr_txg %llu, "
2196             "%s, %llu objects%s\n",
2197             osname, type, (u_longlong_t)dmu_objset_id(os),
2198             (u_longlong_t)dds.dds_creation_txg,
2199             numbuf, (u_longlong_t)usedobjs, blkbuf);
2200 
2201         if (zopt_objects != 0) {
2202                 for (i = 0; i < zopt_objects; i++)
2203                         dump_object(os, zopt_object[i], verbosity,
2204                             &print_header);
2205                 (void) printf("\n");
2206                 return;
2207         }
2208 
2209         if (dump_opt['i'] != 0 || verbosity >= 2)
2210                 dump_intent_log(dmu_objset_zil(os));
2211 
2212         if (dmu_objset_ds(os) != NULL) {
2213                 dsl_dataset_t *ds = dmu_objset_ds(os);
2214                 dump_deadlist(&ds->ds_deadlist);
2215 
2216                 if (dsl_dataset_remap_deadlist_exists(ds)) {
2217                         (void) printf("ds_remap_deadlist:\n");
2218                         dump_deadlist(&ds->ds_remap_deadlist);
2219                 }
2220         }
2221 
2222         if (verbosity < 2)
2223                 return;
2224 
2225         if (BP_IS_HOLE(os->os_rootbp))
2226                 return;
2227 
2228         dump_object(os, 0, verbosity, &print_header);
2229         object_count = 0;
2230         if (DMU_USERUSED_DNODE(os) != NULL &&
2231             DMU_USERUSED_DNODE(os)->dn_type != 0) {
2232                 dump_object(os, DMU_USERUSED_OBJECT, verbosity, &print_header);
2233                 dump_object(os, DMU_GROUPUSED_OBJECT, verbosity, &print_header);
2234         }
2235 
2236         object = 0;
2237         while ((error = dmu_object_next(os, &object, B_FALSE, 0)) == 0) {
2238                 dump_object(os, object, verbosity, &print_header);
2239                 object_count++;
2240         }
2241 


2544                         if (!dump_opt['q'])
2545                                 dump_nvlist(config, 4);
2546                         if ((nvlist_lookup_nvlist(config,
2547                             ZPOOL_CONFIG_VDEV_TREE, &vdev_tree) != 0) ||
2548                             (nvlist_lookup_uint64(vdev_tree,
2549                             ZPOOL_CONFIG_ASHIFT, &ashift) != 0))
2550                                 ashift = SPA_MINBLOCKSHIFT;
2551                         nvlist_free(config);
2552                         label_found = B_TRUE;
2553                 }
2554                 if (dump_opt['u'])
2555                         dump_label_uberblocks(&label, ashift);
2556         }
2557 
2558         (void) close(fd);
2559 
2560         return (label_found ? 0 : 2);
2561 }
2562 
2563 static uint64_t dataset_feature_count[SPA_FEATURES];
2564 static uint64_t remap_deadlist_count = 0;
2565 
2566 /*ARGSUSED*/
2567 static int
2568 dump_one_dir(const char *dsname, void *arg)
2569 {
2570         int error;
2571         objset_t *os;
2572 
2573         error = open_objset(dsname, DMU_OST_ANY, FTAG, &os);
2574         if (error != 0)
2575                 return (0);
2576 
2577         for (spa_feature_t f = 0; f < SPA_FEATURES; f++) {
2578                 if (!dmu_objset_ds(os)->ds_feature_inuse[f])
2579                         continue;
2580                 ASSERT(spa_feature_table[f].fi_flags &
2581                     ZFEATURE_FLAG_PER_DATASET);
2582                 dataset_feature_count[f]++;
2583         }
2584 
2585         if (dsl_dataset_remap_deadlist_exists(dmu_objset_ds(os))) {
2586                 remap_deadlist_count++;
2587         }
2588 
2589         dump_dir(os);
2590         close_objset(os, FTAG);
2591         fuid_table_destroy();
2592         return (0);
2593 }
2594 
2595 /*
2596  * Block statistics.
2597  */
2598 #define PSIZE_HISTO_SIZE (SPA_OLD_MAXBLOCKSIZE / SPA_MINBLOCKSIZE + 2)
2599 typedef struct zdb_blkstats {
2600         uint64_t zb_asize;
2601         uint64_t zb_lsize;
2602         uint64_t zb_psize;
2603         uint64_t zb_count;
2604         uint64_t zb_gangs;
2605         uint64_t zb_ditto_samevdev;
2606         uint64_t zb_psize_histogram[PSIZE_HISTO_SIZE];
2607 } zdb_blkstats_t;
2608 
2609 /*
2610  * Extended object types to report deferred frees and dedup auto-ditto blocks.
2611  */
2612 #define ZDB_OT_DEFERRED (DMU_OT_NUMTYPES + 0)
2613 #define ZDB_OT_DITTO    (DMU_OT_NUMTYPES + 1)
2614 #define ZDB_OT_OTHER    (DMU_OT_NUMTYPES + 2)
2615 #define ZDB_OT_TOTAL    (DMU_OT_NUMTYPES + 3)
2616 
2617 static const char *zdb_ot_extname[] = {
2618         "deferred free",
2619         "dedup ditto",
2620         "other",
2621         "Total",
2622 };
2623 
2624 #define ZB_TOTAL        DN_MAX_LEVELS
2625 
2626 typedef struct zdb_cb {
2627         zdb_blkstats_t  zcb_type[ZB_TOTAL + 1][ZDB_OT_TOTAL + 1];
2628         uint64_t        zcb_removing_size;
2629         uint64_t        zcb_dedup_asize;
2630         uint64_t        zcb_dedup_blocks;
2631         uint64_t        zcb_embedded_blocks[NUM_BP_EMBEDDED_TYPES];
2632         uint64_t        zcb_embedded_histogram[NUM_BP_EMBEDDED_TYPES]
2633             [BPE_PAYLOAD_SIZE];
2634         uint64_t        zcb_start;
2635         hrtime_t        zcb_lastprint;
2636         uint64_t        zcb_totalasize;
2637         uint64_t        zcb_errors[256];
2638         int             zcb_readfails;
2639         int             zcb_haderrors;
2640         spa_t           *zcb_spa;
2641         uint32_t        **zcb_vd_obsolete_counts;
2642 } zdb_cb_t;
2643 
2644 static void
2645 zdb_count_block(zdb_cb_t *zcb, zilog_t *zilog, const blkptr_t *bp,
2646     dmu_object_type_t type)
2647 {
2648         uint64_t refcnt = 0;
2649 
2650         ASSERT(type < ZDB_OT_TOTAL);
2651 
2652         if (zilog && zil_bp_tree_add(zilog, bp) != 0)
2653                 return;
2654 
2655         for (int i = 0; i < 4; i++) {
2656                 int l = (i < 2) ? BP_GET_LEVEL(bp) : ZB_TOTAL;
2657                 int t = (i & 1) ? type : ZDB_OT_TOTAL;
2658                 int equal;
2659                 zdb_blkstats_t *zb = &zcb->zcb_type[l][t];
2660 
2661                 zb->zb_asize += BP_GET_ASIZE(bp);


2692                         break;
2693                 }
2694 
2695         }
2696 
2697         if (BP_IS_EMBEDDED(bp)) {
2698                 zcb->zcb_embedded_blocks[BPE_GET_ETYPE(bp)]++;
2699                 zcb->zcb_embedded_histogram[BPE_GET_ETYPE(bp)]
2700                     [BPE_GET_PSIZE(bp)]++;
2701                 return;
2702         }
2703 
2704         if (dump_opt['L'])
2705                 return;
2706 
2707         if (BP_GET_DEDUP(bp)) {
2708                 ddt_t *ddt;
2709                 ddt_entry_t *dde;
2710 
2711                 ddt = ddt_select(zcb->zcb_spa, bp);
2712                 ddt_enter(ddt);
2713                 dde = ddt_lookup(ddt, bp, B_FALSE);
2714 
2715                 if (dde == NULL) {
2716                         refcnt = 0;
2717                 } else {
2718                         ddt_phys_t *ddp = ddt_phys_select(dde, bp);




2719                         ddt_phys_decref(ddp);
2720                         refcnt = ddp->ddp_refcnt;
2721                         if (ddt_phys_total_refcnt(dde) == 0)
2722                                 ddt_remove(ddt, dde);
2723                 }
2724                 ddt_exit(ddt);
2725         }
2726 
2727         VERIFY3U(zio_wait(zio_claim(NULL, zcb->zcb_spa,
2728             refcnt ? 0 : spa_first_txg(zcb->zcb_spa),
2729             bp, NULL, NULL, ZIO_FLAG_CANFAIL)), ==, 0);
2730 }
2731 
2732 static void
2733 zdb_blkptr_done(zio_t *zio)
2734 {
2735         spa_t *spa = zio->io_spa;
2736         blkptr_t *bp = zio->io_bp;
2737         int ioerr = zio->io_error;
2738         zdb_cb_t *zcb = zio->io_private;
2739         zbookmark_phys_t *zb = &zio->io_bookmark;
2740 
2741         abd_free(zio->io_abd);
2742 
2743         mutex_enter(&spa->spa_scrub_lock);
2744         spa->spa_scrub_inflight--;


2782         if (dump_opt['b'] >= 5 && bp->blk_birth > 0) {
2783                 char blkbuf[BP_SPRINTF_LEN];
2784                 snprintf_blkptr(blkbuf, sizeof (blkbuf), bp);
2785                 (void) printf("objset %llu object %llu "
2786                     "level %lld offset 0x%llx %s\n",
2787                     (u_longlong_t)zb->zb_objset,
2788                     (u_longlong_t)zb->zb_object,
2789                     (longlong_t)zb->zb_level,
2790                     (u_longlong_t)blkid2offset(dnp, bp, zb),
2791                     blkbuf);
2792         }
2793 
2794         if (BP_IS_HOLE(bp))
2795                 return (0);
2796 
2797         type = BP_GET_TYPE(bp);
2798 
2799         zdb_count_block(zcb, zilog, bp,
2800             (type & DMU_OT_NEWTYPE) ? ZDB_OT_OTHER : type);
2801 
2802         is_metadata = (BP_GET_LEVEL(bp) != 0 || DMU_OT_IS_METADATA(type));
2803 
2804         if (!BP_IS_EMBEDDED(bp) &&
2805             (dump_opt['c'] > 1 || (dump_opt['c'] && is_metadata))) {
2806                 size_t size = BP_GET_PSIZE(bp);
2807                 abd_t *abd = abd_alloc(size, B_FALSE);
2808                 int flags = ZIO_FLAG_CANFAIL | ZIO_FLAG_SCRUB | ZIO_FLAG_RAW;
2809 
2810                 /* If it's an intent log block, failure is expected. */
2811                 if (zb->zb_level == ZB_ZIL_LEVEL)
2812                         flags |= ZIO_FLAG_SPECULATIVE;
2813 
2814                 mutex_enter(&spa->spa_scrub_lock);
2815                 while (spa->spa_scrub_inflight > max_inflight)
2816                         cv_wait(&spa->spa_scrub_io_cv, &spa->spa_scrub_lock);
2817                 spa->spa_scrub_inflight++;
2818                 mutex_exit(&spa->spa_scrub_lock);
2819 
2820                 zio_nowait(zio_read(NULL, spa, bp, abd, size,
2821                     zdb_blkptr_done, zcb, ZIO_PRIORITY_ASYNC_READ, flags, zb));
2822         }


2885                 if (ddb.ddb_class == DDT_CLASS_UNIQUE)
2886                         return;
2887 
2888                 ASSERT(ddt_phys_total_refcnt(&dde) > 1);
2889 
2890                 for (int p = 0; p < DDT_PHYS_TYPES; p++, ddp++) {
2891                         if (ddp->ddp_phys_birth == 0)
2892                                 continue;
2893                         ddt_bp_create(ddb.ddb_checksum,
2894                             &dde.dde_key, ddp, &blk);
2895                         if (p == DDT_PHYS_DITTO) {
2896                                 zdb_count_block(zcb, NULL, &blk, ZDB_OT_DITTO);
2897                         } else {
2898                                 zcb->zcb_dedup_asize +=
2899                                     BP_GET_ASIZE(&blk) * (ddp->ddp_refcnt - 1);
2900                                 zcb->zcb_dedup_blocks++;
2901                         }
2902                 }
2903                 if (!dump_opt['L']) {
2904                         ddt_t *ddt = spa->spa_ddt[ddb.ddb_checksum];
2905                         ddt_enter(ddt);
2906                         VERIFY(ddt_lookup(ddt, &blk, B_TRUE) != NULL);
2907                         ddt_exit(ddt);
2908                 }
2909         }
2910 
2911         ASSERT(error == ENOENT);
2912 }
2913 
2914 /* ARGSUSED */
2915 static void
2916 claim_segment_impl_cb(uint64_t inner_offset, vdev_t *vd, uint64_t offset,
2917     uint64_t size, void *arg)
2918 {
2919         /*
2920          * This callback was called through a remap from
2921          * a device being removed. Therefore, the vdev that
2922          * this callback is applied to is a concrete
2923          * vdev.
2924          */
2925         ASSERT(vdev_is_concrete(vd));
2926 
2927         VERIFY0(metaslab_claim_impl(vd, offset, size,
2928             spa_first_txg(vd->vdev_spa)));
2929 }
2930 
2931 static void
2932 claim_segment_cb(void *arg, uint64_t offset, uint64_t size)
2933 {
2934         vdev_t *vd = arg;
2935 
2936         vdev_indirect_ops.vdev_op_remap(vd, offset, size,
2937             claim_segment_impl_cb, NULL);
2938 }
2939 
2940 /*
2941  * After accounting for all allocated blocks that are directly referenced,
2942  * we might have missed a reference to a block from a partially complete
2943  * (and thus unused) indirect mapping object. We perform a secondary pass
2944  * through the metaslabs we have already mapped and claim the destination
2945  * blocks.
2946  */
2947 static void
2948 zdb_claim_removing(spa_t *spa, zdb_cb_t *zcb)
2949 {
2950         if (spa->spa_vdev_removal == NULL)
2951                 return;
2952 
2953         spa_config_enter(spa, SCL_CONFIG, FTAG, RW_READER);
2954 
2955         spa_vdev_removal_t *svr = spa->spa_vdev_removal;
2956         vdev_t *vd = svr->svr_vdev;
2957         vdev_indirect_mapping_t *vim = vd->vdev_indirect_mapping;
2958 
2959         for (uint64_t msi = 0; msi < vd->vdev_ms_count; msi++) {
2960                 metaslab_t *msp = vd->vdev_ms[msi];
2961 
2962                 if (msp->ms_start >= vdev_indirect_mapping_max_offset(vim))
2963                         break;
2964 
2965                 ASSERT0(range_tree_space(svr->svr_allocd_segs));
2966 
2967                 if (msp->ms_sm != NULL) {
2968                         VERIFY0(space_map_load(msp->ms_sm,
2969                             svr->svr_allocd_segs, SM_ALLOC));
2970 
2971                         /*
2972                          * Clear everything past what has been synced,
2973                          * because we have not allocated mappings for it yet.
2974                          */
2975                         range_tree_clear(svr->svr_allocd_segs,
2976                             vdev_indirect_mapping_max_offset(vim),
2977                             msp->ms_sm->sm_start + msp->ms_sm->sm_size -
2978                             vdev_indirect_mapping_max_offset(vim));
2979                 }
2980 
2981                 zcb->zcb_removing_size +=
2982                     range_tree_space(svr->svr_allocd_segs);
2983                 range_tree_vacate(svr->svr_allocd_segs, claim_segment_cb, vd);
2984         }
2985 
2986         spa_config_exit(spa, SCL_CONFIG, FTAG);
2987 }
2988 
2989 /*
2990  * vm_idxp is an in-out parameter which (for indirect vdevs) is the
2991  * index in vim_entries that has the first entry in this metaslab.  On
2992  * return, it will be set to the first entry after this metaslab.
2993  */
2994 static void
2995 zdb_leak_init_ms(metaslab_t *msp, uint64_t *vim_idxp)
2996 {
2997         metaslab_group_t *mg = msp->ms_group;
2998         vdev_t *vd = mg->mg_vd;
2999         vdev_t *rvd = vd->vdev_spa->spa_root_vdev;
3000 
3001         mutex_enter(&msp->ms_lock);
3002         metaslab_unload(msp);
3003 
3004         /*
3005          * We don't want to spend the CPU manipulating the size-ordered
3006          * tree, so clear the range_tree ops.
3007          */
3008         msp->ms_tree->rt_ops = NULL;
3009 
3010         (void) fprintf(stderr,
3011             "\rloading vdev %llu of %llu, metaslab %llu of %llu ...",
3012             (longlong_t)vd->vdev_id,
3013             (longlong_t)rvd->vdev_children,
3014             (longlong_t)msp->ms_id,
3015             (longlong_t)vd->vdev_ms_count);
3016 
3017         /*
3018          * For leak detection, we overload the metaslab ms_tree to
3019          * contain allocated segments instead of free segments. As a
3020          * result, we can't use the normal metaslab_load/unload
3021          * interfaces.
3022          */
3023         if (vd->vdev_ops == &vdev_indirect_ops) {
3024                 vdev_indirect_mapping_t *vim = vd->vdev_indirect_mapping;
3025                 for (; *vim_idxp < vdev_indirect_mapping_num_entries(vim);
3026                     (*vim_idxp)++) {
3027                         vdev_indirect_mapping_entry_phys_t *vimep =
3028                             &vim->vim_entries[*vim_idxp];
3029                         uint64_t ent_offset = DVA_MAPPING_GET_SRC_OFFSET(vimep);
3030                         uint64_t ent_len = DVA_GET_ASIZE(&vimep->vimep_dst);
3031                         ASSERT3U(ent_offset, >=, msp->ms_start);
3032                         if (ent_offset >= msp->ms_start + msp->ms_size)
3033                                 break;
3034 
3035                         /*
3036                          * Mappings do not cross metaslab boundaries,
3037                          * because we create them by walking the metaslabs.
3038                          */
3039                         ASSERT3U(ent_offset + ent_len, <=,
3040                             msp->ms_start + msp->ms_size);
3041                         range_tree_add(msp->ms_tree, ent_offset, ent_len);
3042                 }
3043         } else if (msp->ms_sm != NULL) {
3044                 VERIFY0(space_map_load(msp->ms_sm, msp->ms_tree, SM_ALLOC));
3045         }
3046 
3047         if (!msp->ms_loaded) {
3048                 msp->ms_loaded = B_TRUE;
3049         }
3050         mutex_exit(&msp->ms_lock);
3051 }
3052 
3053 /* ARGSUSED */
3054 static int
3055 increment_indirect_mapping_cb(void *arg, const blkptr_t *bp, dmu_tx_t *tx)
3056 {
3057         zdb_cb_t *zcb = arg;
3058         spa_t *spa = zcb->zcb_spa;
3059         vdev_t *vd;
3060         const dva_t *dva = &bp->blk_dva[0];
3061 
3062         ASSERT(!dump_opt['L']);
3063         ASSERT3U(BP_GET_NDVAS(bp), ==, 1);
3064 
3065         spa_config_enter(spa, SCL_VDEV, FTAG, RW_READER);
3066         vd = vdev_lookup_top(zcb->zcb_spa, DVA_GET_VDEV(dva));
3067         ASSERT3P(vd, !=, NULL);
3068         spa_config_exit(spa, SCL_VDEV, FTAG);
3069 
3070         ASSERT(vd->vdev_indirect_config.vic_mapping_object != 0);
3071         ASSERT3P(zcb->zcb_vd_obsolete_counts[vd->vdev_id], !=, NULL);
3072 
3073         vdev_indirect_mapping_increment_obsolete_count(
3074             vd->vdev_indirect_mapping,
3075             DVA_GET_OFFSET(dva), DVA_GET_ASIZE(dva),
3076             zcb->zcb_vd_obsolete_counts[vd->vdev_id]);
3077 
3078         return (0);
3079 }
3080 
3081 static uint32_t *
3082 zdb_load_obsolete_counts(vdev_t *vd)
3083 {
3084         vdev_indirect_mapping_t *vim = vd->vdev_indirect_mapping;
3085         spa_t *spa = vd->vdev_spa;
3086         spa_condensing_indirect_phys_t *scip =
3087             &spa->spa_condensing_indirect_phys;
3088         uint32_t *counts;
3089 
3090         EQUIV(vdev_obsolete_sm_object(vd) != 0, vd->vdev_obsolete_sm != NULL);
3091         counts = vdev_indirect_mapping_load_obsolete_counts(vim);
3092         if (vd->vdev_obsolete_sm != NULL) {
3093                 vdev_indirect_mapping_load_obsolete_spacemap(vim, counts,
3094                     vd->vdev_obsolete_sm);
3095         }
3096         if (scip->scip_vdev == vd->vdev_id &&
3097             scip->scip_prev_obsolete_sm_object != 0) {
3098                 space_map_t *prev_obsolete_sm = NULL;
3099                 VERIFY0(space_map_open(&prev_obsolete_sm, spa->spa_meta_objset,
3100                     scip->scip_prev_obsolete_sm_object, 0, vd->vdev_asize, 0));
3101                 space_map_update(prev_obsolete_sm);
3102                 vdev_indirect_mapping_load_obsolete_spacemap(vim, counts,
3103                     prev_obsolete_sm);
3104                 space_map_close(prev_obsolete_sm);
3105         }
3106         return (counts);
3107 }
3108 
3109 static void
3110 zdb_leak_init(spa_t *spa, zdb_cb_t *zcb)
3111 {
3112         zcb->zcb_spa = spa;
3113 
3114         if (!dump_opt['L']) {
3115                 dsl_pool_t *dp = spa->spa_dsl_pool;
3116                 vdev_t *rvd = spa->spa_root_vdev;
3117 
3118                 /*
3119                  * We are going to be changing the meaning of the metaslab's
3120                  * ms_tree.  Ensure that the allocator doesn't try to
3121                  * use the tree.
3122                  */
3123                 spa->spa_normal_class->mc_ops = &zdb_metaslab_ops;
3124                 spa->spa_log_class->mc_ops = &zdb_metaslab_ops;
3125 
3126                 zcb->zcb_vd_obsolete_counts =
3127                     umem_zalloc(rvd->vdev_children * sizeof (uint32_t *),
3128                     UMEM_NOFAIL);
3129 
3130 
3131                 for (uint64_t c = 0; c < rvd->vdev_children; c++) {
3132                         vdev_t *vd = rvd->vdev_child[c];
3133                         uint64_t vim_idx = 0;





3134 
3135                         ASSERT3U(c, ==, vd->vdev_id);
3136 
3137                         /*
3138                          * Note: we don't check for mapping leaks on
3139                          * removing vdevs because their ms_tree's are
3140                          * used to look for leaks in allocated space.


3141                          */
3142                         if (vd->vdev_ops == &vdev_indirect_ops) {
3143                                 zcb->zcb_vd_obsolete_counts[c] =
3144                                     zdb_load_obsolete_counts(vd);






3145 
3146                                 /*
3147                                  * Normally, indirect vdevs don't have any
3148                                  * metaslabs.  We want to set them up for
3149                                  * zio_claim().

3150                                  */
3151                                 VERIFY0(vdev_metaslab_init(vd, 0));
3152                         }

3153 
3154                         for (uint64_t m = 0; m < vd->vdev_ms_count; m++) {
3155                                 zdb_leak_init_ms(vd->vdev_ms[m], &vim_idx);
3156                         }
3157                         if (vd->vdev_ops == &vdev_indirect_ops) {
3158                                 ASSERT3U(vim_idx, ==,
3159                                     vdev_indirect_mapping_num_entries(
3160                                     vd->vdev_indirect_mapping));
3161                         }

3162                 }

3163                 (void) fprintf(stderr, "\n");
3164 
3165                 if (bpobj_is_open(&dp->dp_obsolete_bpobj)) {
3166                         ASSERT(spa_feature_is_enabled(spa,
3167                             SPA_FEATURE_DEVICE_REMOVAL));
3168                         (void) bpobj_iterate_nofree(&dp->dp_obsolete_bpobj,
3169                             increment_indirect_mapping_cb, zcb, NULL);
3170                 }
3171         }
3172 
3173         spa_config_enter(spa, SCL_CONFIG, FTAG, RW_READER);
3174 
3175         zdb_ddt_leak_init(spa, zcb);
3176 
3177         spa_config_exit(spa, SCL_CONFIG, FTAG);
3178 }
3179 
3180 static boolean_t
3181 zdb_check_for_obsolete_leaks(vdev_t *vd, zdb_cb_t *zcb)
3182 {
3183         boolean_t leaks = B_FALSE;
3184         vdev_indirect_mapping_t *vim = vd->vdev_indirect_mapping;
3185         uint64_t total_leaked = 0;
3186 
3187         ASSERT(vim != NULL);
3188 
3189         for (uint64_t i = 0; i < vdev_indirect_mapping_num_entries(vim); i++) {
3190                 vdev_indirect_mapping_entry_phys_t *vimep =
3191                     &vim->vim_entries[i];
3192                 uint64_t obsolete_bytes = 0;
3193                 uint64_t offset = DVA_MAPPING_GET_SRC_OFFSET(vimep);
3194                 metaslab_t *msp = vd->vdev_ms[offset >> vd->vdev_ms_shift];
3195 
3196                 /*
3197                  * This is not very efficient but it's easy to
3198                  * verify correctness.
3199                  */
3200                 for (uint64_t inner_offset = 0;
3201                     inner_offset < DVA_GET_ASIZE(&vimep->vimep_dst);
3202                     inner_offset += 1 << vd->vdev_ashift) {
3203                         if (range_tree_contains(msp->ms_tree,
3204                             offset + inner_offset, 1 << vd->vdev_ashift)) {
3205                                 obsolete_bytes += 1 << vd->vdev_ashift;
3206                         }
3207                 }
3208 
3209                 int64_t bytes_leaked = obsolete_bytes -
3210                     zcb->zcb_vd_obsolete_counts[vd->vdev_id][i];
3211                 ASSERT3U(DVA_GET_ASIZE(&vimep->vimep_dst), >=,
3212                     zcb->zcb_vd_obsolete_counts[vd->vdev_id][i]);
3213                 if (bytes_leaked != 0 &&
3214                     (vdev_obsolete_counts_are_precise(vd) ||
3215                     dump_opt['d'] >= 5)) {
3216                         (void) printf("obsolete indirect mapping count "
3217                             "mismatch on %llu:%llx:%llx : %llx bytes leaked\n",
3218                             (u_longlong_t)vd->vdev_id,
3219                             (u_longlong_t)DVA_MAPPING_GET_SRC_OFFSET(vimep),
3220                             (u_longlong_t)DVA_GET_ASIZE(&vimep->vimep_dst),
3221                             (u_longlong_t)bytes_leaked);
3222                 }
3223                 total_leaked += ABS(bytes_leaked);
3224         }
3225 
3226         if (!vdev_obsolete_counts_are_precise(vd) && total_leaked > 0) {
3227                 int pct_leaked = total_leaked * 100 /
3228                     vdev_indirect_mapping_bytes_mapped(vim);
3229                 (void) printf("cannot verify obsolete indirect mapping "
3230                     "counts of vdev %llu because precise feature was not "
3231                     "enabled when it was removed: %d%% (%llx bytes) of mapping"
3232                     "unreferenced\n",
3233                     (u_longlong_t)vd->vdev_id, pct_leaked,
3234                     (u_longlong_t)total_leaked);
3235         } else if (total_leaked > 0) {
3236                 (void) printf("obsolete indirect mapping count mismatch "
3237                     "for vdev %llu -- %llx total bytes mismatched\n",
3238                     (u_longlong_t)vd->vdev_id,
3239                     (u_longlong_t)total_leaked);
3240                 leaks |= B_TRUE;
3241         }
3242 
3243         vdev_indirect_mapping_free_obsolete_counts(vim,
3244             zcb->zcb_vd_obsolete_counts[vd->vdev_id]);
3245         zcb->zcb_vd_obsolete_counts[vd->vdev_id] = NULL;
3246 
3247         return (leaks);
3248 }
3249 
3250 static boolean_t
3251 zdb_leak_fini(spa_t *spa, zdb_cb_t *zcb)
3252 {
3253         boolean_t leaks = B_FALSE;
3254         if (!dump_opt['L']) {
3255                 vdev_t *rvd = spa->spa_root_vdev;
3256                 for (unsigned c = 0; c < rvd->vdev_children; c++) {
3257                         vdev_t *vd = rvd->vdev_child[c];
3258                         metaslab_group_t *mg = vd->vdev_mg;
3259 
3260                         if (zcb->zcb_vd_obsolete_counts[c] != NULL) {
3261                                 leaks |= zdb_check_for_obsolete_leaks(vd, zcb);
3262                         }
3263 
3264                         for (uint64_t m = 0; m < vd->vdev_ms_count; m++) {
3265                                 metaslab_t *msp = vd->vdev_ms[m];
3266                                 ASSERT3P(mg, ==, msp->ms_group);

3267 
3268                                 /*
3269                                  * The ms_tree has been overloaded to
3270                                  * contain allocated segments. Now that we
3271                                  * finished traversing all blocks, any
3272                                  * block that remains in the ms_tree
3273                                  * represents an allocated block that we
3274                                  * did not claim during the traversal.
3275                                  * Claimed blocks would have been removed
3276                                  * from the ms_tree.  For indirect vdevs,
3277                                  * space remaining in the tree represents
3278                                  * parts of the mapping that are not
3279                                  * referenced, which is not a bug.
3280                                  */
3281                                 if (vd->vdev_ops == &vdev_indirect_ops) {
3282                                         range_tree_vacate(msp->ms_tree,
3283                                             NULL, NULL);
3284                                 } else {
3285                                         range_tree_vacate(msp->ms_tree,
3286                                             zdb_leak, vd);
3287                                 }
3288 
3289                                 if (msp->ms_loaded) {
3290                                         msp->ms_loaded = B_FALSE;
3291                                 }


3292                         }
3293                 }
3294 
3295                 umem_free(zcb->zcb_vd_obsolete_counts,
3296                     rvd->vdev_children * sizeof (uint32_t *));
3297                 zcb->zcb_vd_obsolete_counts = NULL;
3298         }
3299         return (leaks);
3300 }
3301 
3302 /* ARGSUSED */
3303 static int
3304 count_block_cb(void *arg, const blkptr_t *bp, dmu_tx_t *tx)
3305 {
3306         zdb_cb_t *zcb = arg;
3307 
3308         if (dump_opt['b'] >= 5) {
3309                 char blkbuf[BP_SPRINTF_LEN];
3310                 snprintf_blkptr(blkbuf, sizeof (blkbuf), bp);
3311                 (void) printf("[%s] %s\n",
3312                     "deferred free", blkbuf);
3313         }
3314         zdb_count_block(zcb, NULL, bp, ZDB_OT_DEFERRED);
3315         return (0);
3316 }
3317 
3318 static int
3319 dump_block_stats(spa_t *spa)
3320 {
3321         zdb_cb_t zcb;
3322         zdb_blkstats_t *zb, *tzb;
3323         uint64_t norm_alloc, norm_space, total_alloc, total_found;
3324         int flags = TRAVERSE_PRE | TRAVERSE_PREFETCH_METADATA | TRAVERSE_HARD;
3325         boolean_t leaks = B_FALSE;
3326 
3327         bzero(&zcb, sizeof (zcb));
3328         (void) printf("\nTraversing all blocks %s%s%s%s%s...\n\n",
3329             (dump_opt['c'] || !dump_opt['L']) ? "to verify " : "",
3330             (dump_opt['c'] == 1) ? "metadata " : "",
3331             dump_opt['c'] ? "checksums " : "",
3332             (dump_opt['c'] && !dump_opt['L']) ? "and verify " : "",
3333             !dump_opt['L'] ? "nothing leaked " : "");
3334 
3335         /*
3336          * Load all space maps as SM_ALLOC maps, then traverse the pool
3337          * claiming each block we discover.  If the pool is perfectly
3338          * consistent, the space maps will be empty when we're done.
3339          * Anything left over is a leak; any block we can't claim (because
3340          * it's not part of any space map) is a double allocation,
3341          * reference to a freed block, or an unclaimed log block.
3342          */
3343         zdb_leak_init(spa, &zcb);
3344 
3345         /*
3346          * If there's a deferred-free bplist, process that first.
3347          */
3348         (void) bpobj_iterate_nofree(&spa->spa_deferred_bpobj,
3349             count_block_cb, &zcb, NULL);
3350 
3351         if (spa_version(spa) >= SPA_VERSION_DEADLISTS) {
3352                 (void) bpobj_iterate_nofree(&spa->spa_dsl_pool->dp_free_bpobj,
3353                     count_block_cb, &zcb, NULL);
3354         }
3355 
3356         zdb_claim_removing(spa, &zcb);
3357 
3358         if (spa_feature_is_active(spa, SPA_FEATURE_ASYNC_DESTROY)) {
3359                 VERIFY3U(0, ==, bptree_iterate(spa->spa_meta_objset,
3360                     spa->spa_dsl_pool->dp_bptree_obj, B_FALSE, count_block_cb,
3361                     &zcb, NULL));
3362         }
3363 
3364         if (dump_opt['c'] > 1)
3365                 flags |= TRAVERSE_PREFETCH_DATA;
3366 
3367         zcb.zcb_totalasize = metaslab_class_get_alloc(spa_normal_class(spa));
3368         zcb.zcb_start = zcb.zcb_lastprint = gethrtime();
3369         zcb.zcb_haderrors |= traverse_pool(spa, 0, flags, zdb_blkptr_cb, &zcb);

3370 
3371         /*
3372          * If we've traversed the data blocks then we need to wait for those
3373          * I/Os to complete. We leverage "The Godfather" zio to wait on
3374          * all async I/Os to complete.
3375          */
3376         if (dump_opt['c']) {
3377                 for (int i = 0; i < max_ncpus; i++) {
3378                         (void) zio_wait(spa->spa_async_zio_root[i]);
3379                         spa->spa_async_zio_root[i] = zio_root(spa, NULL, NULL,
3380                             ZIO_FLAG_CANFAIL | ZIO_FLAG_SPECULATIVE |
3381                             ZIO_FLAG_GODFATHER);
3382                 }
3383         }
3384 
3385         if (zcb.zcb_haderrors) {
3386                 (void) printf("\nError counts:\n\n");
3387                 (void) printf("\t%5s  %s\n", "errno", "count");
3388                 for (int e = 0; e < 256; e++) {
3389                         if (zcb.zcb_errors[e] != 0) {
3390                                 (void) printf("\t%5d  %llu\n",
3391                                     e, (u_longlong_t)zcb.zcb_errors[e]);
3392                         }
3393                 }
3394         }
3395 
3396         /*
3397          * Report any leaked segments.
3398          */
3399         leaks |= zdb_leak_fini(spa, &zcb);
3400 
3401         tzb = &zcb.zcb_type[ZB_TOTAL][ZDB_OT_TOTAL];
3402 
3403         norm_alloc = metaslab_class_get_alloc(spa_normal_class(spa));

3404         norm_space = metaslab_class_get_space(spa_normal_class(spa));
3405 

3406         total_alloc = norm_alloc + metaslab_class_get_alloc(spa_log_class(spa));
3407         total_found = tzb->zb_asize - zcb.zcb_dedup_asize +
3408             zcb.zcb_removing_size;
3409 
3410         if (total_found == total_alloc) {
3411                 if (!dump_opt['L'])
3412                         (void) printf("\n\tNo leaks (block sum matches space"
3413                             " maps exactly)\n");
3414         } else {
3415                 (void) printf("block traversal size %llu != alloc %llu "
3416                     "(%s %lld)\n",
3417                     (u_longlong_t)total_found,
3418                     (u_longlong_t)total_alloc,
3419                     (dump_opt['L']) ? "unreachable" : "leaked",
3420                     (longlong_t)(total_alloc - total_found));
3421                 leaks = B_TRUE;
3422         }
3423 
3424         if (tzb->zb_count == 0)
3425                 return (2);
3426 
3427         (void) printf("\n");
3428         (void) printf("\tbp count:      %10llu\n",


3430         (void) printf("\tganged count:  %10llu\n",
3431             (longlong_t)tzb->zb_gangs);
3432         (void) printf("\tbp logical:    %10llu      avg: %6llu\n",
3433             (u_longlong_t)tzb->zb_lsize,
3434             (u_longlong_t)(tzb->zb_lsize / tzb->zb_count));
3435         (void) printf("\tbp physical:   %10llu      avg:"
3436             " %6llu     compression: %6.2f\n",
3437             (u_longlong_t)tzb->zb_psize,
3438             (u_longlong_t)(tzb->zb_psize / tzb->zb_count),
3439             (double)tzb->zb_lsize / tzb->zb_psize);
3440         (void) printf("\tbp allocated:  %10llu      avg:"
3441             " %6llu     compression: %6.2f\n",
3442             (u_longlong_t)tzb->zb_asize,
3443             (u_longlong_t)(tzb->zb_asize / tzb->zb_count),
3444             (double)tzb->zb_lsize / tzb->zb_asize);
3445         (void) printf("\tbp deduped:    %10llu    ref>1:"
3446             " %6llu   deduplication: %6.2f\n",
3447             (u_longlong_t)zcb.zcb_dedup_asize,
3448             (u_longlong_t)zcb.zcb_dedup_blocks,
3449             (double)zcb.zcb_dedup_asize / tzb->zb_asize + 1.0);




3450         (void) printf("\tSPA allocated: %10llu     used: %5.2f%%\n",
3451             (u_longlong_t)norm_alloc, 100.0 * norm_alloc / norm_space);
3452 
3453         for (bp_embedded_type_t i = 0; i < NUM_BP_EMBEDDED_TYPES; i++) {
3454                 if (zcb.zcb_embedded_blocks[i] == 0)
3455                         continue;
3456                 (void) printf("\n");
3457                 (void) printf("\tadditional, non-pointer bps of type %u: "
3458                     "%10llu\n",
3459                     i, (u_longlong_t)zcb.zcb_embedded_blocks[i]);
3460 
3461                 if (dump_opt['b'] >= 3) {
3462                         (void) printf("\t number of (compressed) bytes:  "
3463                             "number of bps\n");
3464                         dump_histogram(zcb.zcb_embedded_histogram[i],
3465                             sizeof (zcb.zcb_embedded_histogram[i]) /
3466                             sizeof (zcb.zcb_embedded_histogram[i][0]), 0);
3467                 }
3468         }
3469 
3470         if (tzb->zb_ditto_samevdev != 0) {
3471                 (void) printf("\tDittoed blocks on same vdev: %llu\n",
3472                     (longlong_t)tzb->zb_ditto_samevdev);
3473         }
3474 
3475         for (uint64_t v = 0; v < spa->spa_root_vdev->vdev_children; v++) {
3476                 vdev_t *vd = spa->spa_root_vdev->vdev_child[v];
3477                 vdev_indirect_mapping_t *vim = vd->vdev_indirect_mapping;
3478 
3479                 if (vim == NULL) {
3480                         continue;
3481                 }
3482 
3483                 char mem[32];
3484                 zdb_nicenum(vdev_indirect_mapping_num_entries(vim),
3485                     mem, vdev_indirect_mapping_size(vim));
3486 
3487                 (void) printf("\tindirect vdev id %llu has %llu segments "
3488                     "(%s in memory)\n",
3489                     (longlong_t)vd->vdev_id,
3490                     (longlong_t)vdev_indirect_mapping_num_entries(vim), mem);
3491         }
3492 
3493         if (dump_opt['b'] >= 2) {
3494                 int l, t, level;
3495                 (void) printf("\nBlocks\tLSIZE\tPSIZE\tASIZE"
3496                     "\t  avg\t comp\t%%Total\tType\n");
3497 
3498                 for (t = 0; t <= ZDB_OT_TOTAL; t++) {
3499                         char csize[32], lsize[32], psize[32], asize[32];
3500                         char avg[32], gang[32];
3501                         const char *typename;
3502 
3503                         /* make sure nicenum has enough space */
3504                         CTASSERT(sizeof (csize) >= NN_NUMBUF_SZ);
3505                         CTASSERT(sizeof (lsize) >= NN_NUMBUF_SZ);
3506                         CTASSERT(sizeof (psize) >= NN_NUMBUF_SZ);
3507                         CTASSERT(sizeof (asize) >= NN_NUMBUF_SZ);
3508                         CTASSERT(sizeof (avg) >= NN_NUMBUF_SZ);
3509                         CTASSERT(sizeof (gang) >= NN_NUMBUF_SZ);
3510 
3511                         if (t < DMU_OT_NUMTYPES)
3512                                 typename = dmu_ot[t].ot_name;


3605 static int
3606 zdb_ddt_add_cb(spa_t *spa, zilog_t *zilog, const blkptr_t *bp,
3607     const zbookmark_phys_t *zb, const dnode_phys_t *dnp, void *arg)
3608 {
3609         avl_tree_t *t = arg;
3610         avl_index_t where;
3611         zdb_ddt_entry_t *zdde, zdde_search;
3612 
3613         if (bp == NULL || BP_IS_HOLE(bp) || BP_IS_EMBEDDED(bp))
3614                 return (0);
3615 
3616         if (dump_opt['S'] > 1 && zb->zb_level == ZB_ROOT_LEVEL) {
3617                 (void) printf("traversing objset %llu, %llu objects, "
3618                     "%lu blocks so far\n",
3619                     (u_longlong_t)zb->zb_objset,
3620                     (u_longlong_t)BP_GET_FILL(bp),
3621                     avl_numnodes(t));
3622         }
3623 
3624         if (BP_IS_HOLE(bp) || BP_GET_CHECKSUM(bp) == ZIO_CHECKSUM_OFF ||
3625             BP_GET_LEVEL(bp) > 0 || DMU_OT_IS_METADATA(BP_GET_TYPE(bp)))
3626                 return (0);
3627 
3628         ddt_key_fill(&zdde_search.zdde_key, bp);
3629 
3630         zdde = avl_find(t, &zdde_search, &where);
3631 
3632         if (zdde == NULL) {
3633                 zdde = umem_zalloc(sizeof (*zdde), UMEM_NOFAIL);
3634                 zdde->zdde_key = zdde_search.zdde_key;
3635                 avl_insert(t, zdde, where);
3636         }
3637 
3638         zdde->zdde_ref_blocks += 1;
3639         zdde->zdde_ref_lsize += BP_GET_LSIZE(bp);
3640         zdde->zdde_ref_psize += BP_GET_PSIZE(bp);
3641         zdde->zdde_ref_dsize += bp_get_dsize_sync(spa, bp);
3642 
3643         return (0);
3644 }
3645 
3646 static void
3647 dump_simulated_ddt(spa_t *spa)
3648 {
3649         avl_tree_t t;
3650         void *cookie = NULL;
3651         zdb_ddt_entry_t *zdde;
3652         ddt_histogram_t ddh_total;
3653         ddt_stat_t dds_total;
3654 
3655         bzero(&ddh_total, sizeof (ddh_total));
3656         bzero(&dds_total, sizeof (dds_total));
3657         avl_create(&t, ddt_entry_compare,
3658             sizeof (zdb_ddt_entry_t), offsetof(zdb_ddt_entry_t, zdde_node));
3659 
3660         spa_config_enter(spa, SCL_CONFIG, FTAG, RW_READER);
3661 
3662         (void) traverse_pool(spa, 0, TRAVERSE_PRE | TRAVERSE_PREFETCH_METADATA,
3663             zdb_ddt_add_cb, &t);

3664 
3665         spa_config_exit(spa, SCL_CONFIG, FTAG);
3666 
3667         while ((zdde = avl_destroy_nodes(&t, &cookie)) != NULL) {
3668                 ddt_stat_t dds;
3669                 uint64_t refcnt = zdde->zdde_ref_blocks;
3670                 ASSERT(refcnt != 0);
3671 
3672                 dds.dds_blocks = zdde->zdde_ref_blocks / refcnt;
3673                 dds.dds_lsize = zdde->zdde_ref_lsize / refcnt;
3674                 dds.dds_psize = zdde->zdde_ref_psize / refcnt;
3675                 dds.dds_dsize = zdde->zdde_ref_dsize / refcnt;
3676 
3677                 dds.dds_ref_blocks = zdde->zdde_ref_blocks;
3678                 dds.dds_ref_lsize = zdde->zdde_ref_lsize;
3679                 dds.dds_ref_psize = zdde->zdde_ref_psize;
3680                 dds.dds_ref_dsize = zdde->zdde_ref_dsize;
3681 
3682                 ddt_stat_add(&ddh_total.ddh_stat[highbit64(refcnt) - 1],
3683                     &dds, 0);
3684 
3685                 umem_free(zdde, sizeof (*zdde));
3686         }
3687 
3688         avl_destroy(&t);
3689 
3690         ddt_histogram_stat(&dds_total, &ddh_total);
3691 
3692         (void) printf("Simulated DDT histogram:\n");
3693 
3694         zpool_dump_ddt(&dds_total, &ddh_total);
3695 
3696         dump_dedup_ratio(&dds_total);
3697 }
3698 
3699 static int
3700 verify_device_removal_feature_counts(spa_t *spa)
3701 {
3702         uint64_t dr_feature_refcount = 0;
3703         uint64_t oc_feature_refcount = 0;
3704         uint64_t indirect_vdev_count = 0;
3705         uint64_t precise_vdev_count = 0;
3706         uint64_t obsolete_counts_object_count = 0;
3707         uint64_t obsolete_sm_count = 0;
3708         uint64_t obsolete_counts_count = 0;
3709         uint64_t scip_count = 0;
3710         uint64_t obsolete_bpobj_count = 0;
3711         int ret = 0;
3712 
3713         spa_condensing_indirect_phys_t *scip =
3714             &spa->spa_condensing_indirect_phys;
3715         if (scip->scip_next_mapping_object != 0) {
3716                 vdev_t *vd = spa->spa_root_vdev->vdev_child[scip->scip_vdev];
3717                 ASSERT(scip->scip_prev_obsolete_sm_object != 0);
3718                 ASSERT3P(vd->vdev_ops, ==, &vdev_indirect_ops);
3719 
3720                 (void) printf("Condensing indirect vdev %llu: new mapping "
3721                     "object %llu, prev obsolete sm %llu\n",
3722                     (u_longlong_t)scip->scip_vdev,
3723                     (u_longlong_t)scip->scip_next_mapping_object,
3724                     (u_longlong_t)scip->scip_prev_obsolete_sm_object);
3725                 if (scip->scip_prev_obsolete_sm_object != 0) {
3726                         space_map_t *prev_obsolete_sm = NULL;
3727                         VERIFY0(space_map_open(&prev_obsolete_sm,
3728                             spa->spa_meta_objset,
3729                             scip->scip_prev_obsolete_sm_object,
3730                             0, vd->vdev_asize, 0));
3731                         space_map_update(prev_obsolete_sm);
3732                         dump_spacemap(spa->spa_meta_objset, prev_obsolete_sm);
3733                         (void) printf("\n");
3734                         space_map_close(prev_obsolete_sm);
3735                 }
3736 
3737                 scip_count += 2;
3738         }
3739 
3740         for (uint64_t i = 0; i < spa->spa_root_vdev->vdev_children; i++) {
3741                 vdev_t *vd = spa->spa_root_vdev->vdev_child[i];
3742                 vdev_indirect_config_t *vic = &vd->vdev_indirect_config;
3743 
3744                 if (vic->vic_mapping_object != 0) {
3745                         ASSERT(vd->vdev_ops == &vdev_indirect_ops ||
3746                             vd->vdev_removing);
3747                         indirect_vdev_count++;
3748 
3749                         if (vd->vdev_indirect_mapping->vim_havecounts) {
3750                                 obsolete_counts_count++;
3751                         }
3752                 }
3753                 if (vdev_obsolete_counts_are_precise(vd)) {
3754                         ASSERT(vic->vic_mapping_object != 0);
3755                         precise_vdev_count++;
3756                 }
3757                 if (vdev_obsolete_sm_object(vd) != 0) {
3758                         ASSERT(vic->vic_mapping_object != 0);
3759                         obsolete_sm_count++;
3760                 }
3761         }
3762 
3763         (void) feature_get_refcount(spa,
3764             &spa_feature_table[SPA_FEATURE_DEVICE_REMOVAL],
3765             &dr_feature_refcount);
3766         (void) feature_get_refcount(spa,
3767             &spa_feature_table[SPA_FEATURE_OBSOLETE_COUNTS],
3768             &oc_feature_refcount);
3769 
3770         if (dr_feature_refcount != indirect_vdev_count) {
3771                 ret = 1;
3772                 (void) printf("Number of indirect vdevs (%llu) " \
3773                     "does not match feature count (%llu)\n",
3774                     (u_longlong_t)indirect_vdev_count,
3775                     (u_longlong_t)dr_feature_refcount);
3776         } else {
3777                 (void) printf("Verified device_removal feature refcount " \
3778                     "of %llu is correct\n",
3779                     (u_longlong_t)dr_feature_refcount);
3780         }
3781 
3782         if (zap_contains(spa_meta_objset(spa), DMU_POOL_DIRECTORY_OBJECT,
3783             DMU_POOL_OBSOLETE_BPOBJ) == 0) {
3784                 obsolete_bpobj_count++;
3785         }
3786 
3787 
3788         obsolete_counts_object_count = precise_vdev_count;
3789         obsolete_counts_object_count += obsolete_sm_count;
3790         obsolete_counts_object_count += obsolete_counts_count;
3791         obsolete_counts_object_count += scip_count;
3792         obsolete_counts_object_count += obsolete_bpobj_count;
3793         obsolete_counts_object_count += remap_deadlist_count;
3794 
3795         if (oc_feature_refcount != obsolete_counts_object_count) {
3796                 ret = 1;
3797                 (void) printf("Number of obsolete counts objects (%llu) " \
3798                     "does not match feature count (%llu)\n",
3799                     (u_longlong_t)obsolete_counts_object_count,
3800                     (u_longlong_t)oc_feature_refcount);
3801                 (void) printf("pv:%llu os:%llu oc:%llu sc:%llu "
3802                     "ob:%llu rd:%llu\n",
3803                     (u_longlong_t)precise_vdev_count,
3804                     (u_longlong_t)obsolete_sm_count,
3805                     (u_longlong_t)obsolete_counts_count,
3806                     (u_longlong_t)scip_count,
3807                     (u_longlong_t)obsolete_bpobj_count,
3808                     (u_longlong_t)remap_deadlist_count);
3809         } else {
3810                 (void) printf("Verified indirect_refcount feature refcount " \
3811                     "of %llu is correct\n",
3812                     (u_longlong_t)oc_feature_refcount);
3813         }
3814         return (ret);
3815 }
3816 
3817 static void
3818 dump_zpool(spa_t *spa)
3819 {
3820         dsl_pool_t *dp = spa_get_dsl(spa);
3821         int rc = 0;
3822 
3823         if (dump_opt['S']) {
3824                 dump_simulated_ddt(spa);
3825                 return;
3826         }
3827 
3828         if (!dump_opt['e'] && dump_opt['C'] > 1) {
3829                 (void) printf("\nCached configuration:\n");
3830                 dump_nvlist(spa->spa_config, 8);
3831         }
3832 
3833         if (dump_opt['C'])
3834                 dump_config(spa);
3835 
3836         if (dump_opt['u'])
3837                 dump_uberblock(&spa->spa_uberblock, "\nUberblock:\n", "\n");
3838 
3839         if (dump_opt['D'])
3840                 dump_all_ddts(spa);
3841 
3842         if (dump_opt['d'] > 2 || dump_opt['m'])
3843                 dump_metaslabs(spa);
3844         if (dump_opt['M'])
3845                 dump_metaslab_groups(spa);
3846 
3847         if (dump_opt['d'] || dump_opt['i']) {
3848                 dump_dir(dp->dp_meta_objset);
3849                 if (dump_opt['d'] >= 3) {
3850                         dsl_pool_t *dp = spa->spa_dsl_pool;
3851                         dump_full_bpobj(&spa->spa_deferred_bpobj,
3852                             "Deferred frees", 0);
3853                         if (spa_version(spa) >= SPA_VERSION_DEADLISTS) {
3854                                 dump_full_bpobj(&dp->dp_free_bpobj,

3855                                     "Pool snapshot frees", 0);
3856                         }
3857                         if (bpobj_is_open(&dp->dp_obsolete_bpobj)) {
3858                                 ASSERT(spa_feature_is_enabled(spa,
3859                                     SPA_FEATURE_DEVICE_REMOVAL));
3860                                 dump_full_bpobj(&dp->dp_obsolete_bpobj,
3861                                     "Pool obsolete blocks", 0);
3862                         }
3863 
3864                         if (spa_feature_is_active(spa,
3865                             SPA_FEATURE_ASYNC_DESTROY)) {
3866                                 dump_bptree(spa->spa_meta_objset,
3867                                     dp->dp_bptree_obj,
3868                                     "Pool dataset frees");
3869                         }
3870                         dump_dtl(spa->spa_root_vdev, 0);
3871                 }
3872                 (void) dmu_objset_find(spa_name(spa), dump_one_dir,
3873                     NULL, DS_FIND_SNAPSHOTS | DS_FIND_CHILDREN);
3874 
3875                 for (spa_feature_t f = 0; f < SPA_FEATURES; f++) {
3876                         uint64_t refcount;
3877 
3878                         if (!(spa_feature_table[f].fi_flags &
3879                             ZFEATURE_FLAG_PER_DATASET) ||
3880                             !spa_feature_is_enabled(spa, f)) {
3881                                 ASSERT0(dataset_feature_count[f]);
3882                                 continue;
3883                         }
3884                         (void) feature_get_refcount(spa,
3885                             &spa_feature_table[f], &refcount);
3886                         if (dataset_feature_count[f] != refcount) {
3887                                 (void) printf("%s feature refcount mismatch: "
3888                                     "%lld datasets != %lld refcount\n",
3889                                     spa_feature_table[f].fi_uname,
3890                                     (longlong_t)dataset_feature_count[f],
3891                                     (longlong_t)refcount);
3892                                 rc = 2;
3893                         } else {
3894                                 (void) printf("Verified %s feature refcount "
3895                                     "of %llu is correct\n",
3896                                     spa_feature_table[f].fi_uname,
3897                                     (longlong_t)refcount);
3898                         }
3899                 }
3900 
3901                 if (rc == 0) {
3902                         rc = verify_device_removal_feature_counts(spa);
3903                 }
3904         }
3905         if (rc == 0 && (dump_opt['b'] || dump_opt['c']))
3906                 rc = dump_block_stats(spa);
3907 
3908         if (rc == 0)
3909                 rc = verify_spacemap_refcounts(spa);
3910 
3911         if (dump_opt['s'])
3912                 show_pool_stats(spa);
3913 
3914         if (dump_opt['h'])
3915                 dump_history(spa);
3916 
3917         if (rc != 0) {
3918                 dump_debug_buffer();
3919                 exit(rc);
3920         }
3921 }
3922 
3923 #define ZDB_FLAG_CHECKSUM       0x0001
3924 #define ZDB_FLAG_DECOMPRESS     0x0002


4191         BP_SET_BYTEORDER(bp, ZFS_HOST_BYTEORDER);
4192 
4193         spa_config_enter(spa, SCL_STATE, FTAG, RW_READER);
4194         zio = zio_root(spa, NULL, NULL, 0);
4195 
4196         if (vd == vd->vdev_top) {
4197                 /*
4198                  * Treat this as a normal block read.
4199                  */
4200                 zio_nowait(zio_read(zio, spa, bp, pabd, psize, NULL, NULL,
4201                     ZIO_PRIORITY_SYNC_READ,
4202                     ZIO_FLAG_CANFAIL | ZIO_FLAG_RAW, NULL));
4203         } else {
4204                 /*
4205                  * Treat this as a vdev child I/O.
4206                  */
4207                 zio_nowait(zio_vdev_child_io(zio, bp, vd, offset, pabd,
4208                     psize, ZIO_TYPE_READ, ZIO_PRIORITY_SYNC_READ,
4209                     ZIO_FLAG_DONT_CACHE | ZIO_FLAG_DONT_QUEUE |
4210                     ZIO_FLAG_DONT_PROPAGATE | ZIO_FLAG_DONT_RETRY |
4211                     ZIO_FLAG_CANFAIL | ZIO_FLAG_RAW | ZIO_FLAG_OPTIONAL,
4212                     NULL, NULL));
4213         }
4214 
4215         error = zio_wait(zio);
4216         spa_config_exit(spa, SCL_STATE, FTAG);
4217 
4218         if (error) {
4219                 (void) printf("Read of %s failed, error: %d\n", thing, error);
4220                 goto out;
4221         }
4222 
4223         if (flags & ZDB_FLAG_DECOMPRESS) {
4224                 /*
4225                  * We don't know how the data was compressed, so just try
4226                  * every decompress function at every inflated blocksize.
4227                  */
4228                 enum zio_compress c;
4229                 void *pbuf2 = umem_alloc(SPA_MAXBLOCKSIZE, UMEM_NOFAIL);
4230                 void *lbuf2 = umem_alloc(SPA_MAXBLOCKSIZE, UMEM_NOFAIL);
4231 
4232                 abd_copy_to_buf(pbuf2, pabd, psize);


4531         }
4532 
4533         /*
4534          * ZDB does not typically re-read blocks; therefore limit the ARC
4535          * to 256 MB, which can be used entirely for metadata.
4536          */
4537         zfs_arc_max = zfs_arc_meta_limit = 256 * 1024 * 1024;
4538 
4539         /*
4540          * "zdb -c" uses checksum-verifying scrub i/os which are async reads.
4541          * "zdb -b" uses traversal prefetch which uses async reads.
4542          * For good performance, let several of them be active at once.
4543          */
4544         zfs_vdev_async_read_max_active = 10;
4545 
4546         /*
4547          * Disable reference tracking for better performance.
4548          */
4549         reference_tracking_enable = B_FALSE;
4550 
4551         /*
4552          * Do not fail spa_load when spa_load_verify fails. This is needed
4553          * to load non-idle pools.
4554          */
4555         spa_load_verify_dryrun = B_TRUE;
4556 
4557         kernel_init(FREAD);
4558         g_zfs = libzfs_init();
4559         ASSERT(g_zfs != NULL);
4560 
4561         if (dump_all)
4562                 verbose = MAX(verbose, 1);
4563 
4564         for (c = 0; c < 256; c++) {
4565                 if (dump_all && strchr("AeEFlLOPRSX", c) == NULL)
4566                         dump_opt[c] = 1;
4567                 if (dump_opt[c])
4568                         dump_opt[c] += verbose;
4569         }
4570 
4571         aok = (dump_opt['A'] == 1) || (dump_opt['A'] > 2);
4572         zfs_recover = (dump_opt['A'] > 1);
4573 
4574         argc -= optind;
4575         argv += optind;
4576 




   4  * The contents of this file are subject to the terms of the
   5  * Common Development and Distribution License (the "License").
   6  * You may not use this file except in compliance with the License.
   7  *
   8  * You can obtain a copy of the license at usr/src/OPENSOLARIS.LICENSE
   9  * or http://www.opensolaris.org/os/licensing.
  10  * See the License for the specific language governing permissions
  11  * and limitations under the License.
  12  *
  13  * When distributing Covered Code, include this CDDL HEADER in each
  14  * file and include the License file at usr/src/OPENSOLARIS.LICENSE.
  15  * If applicable, add the following below this CDDL HEADER, with the
  16  * fields enclosed by brackets "[]" replaced with your own identifying
  17  * information: Portions Copyright [yyyy] [name of copyright owner]
  18  *
  19  * CDDL HEADER END
  20  */
  21 
  22 /*
  23  * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved.
  24  * Copyright (c) 2011, 2016 by Delphix. All rights reserved.
  25  * Copyright (c) 2014 Integros [integros.com]
  26  * Copyright 2017 Nexenta Systems, Inc.
  27  * Copyright 2017 RackTop Systems.
  28  */
  29 
  30 #include <stdio.h>
  31 #include <unistd.h>
  32 #include <stdio_ext.h>
  33 #include <stdlib.h>
  34 #include <ctype.h>
  35 #include <string.h>
  36 #include <errno.h>
  37 #include <sys/zfs_context.h>
  38 #include <sys/spa.h>
  39 #include <sys/spa_impl.h>
  40 #include <sys/dmu.h>
  41 #include <sys/zap.h>
  42 #include <sys/fs/zfs.h>
  43 #include <sys/zfs_znode.h>
  44 #include <sys/zfs_sa.h>
  45 #include <sys/sa.h>
  46 #include <sys/sa_impl.h>
  47 #include <sys/vdev.h>
  48 #include <sys/vdev_impl.h>
  49 #include <sys/metaslab_impl.h>
  50 #include <sys/dmu_objset.h>
  51 #include <sys/dsl_dir.h>
  52 #include <sys/dsl_dataset.h>
  53 #include <sys/dsl_pool.h>
  54 #include <sys/dbuf.h>
  55 #include <sys/zil.h>
  56 #include <sys/zil_impl.h>


  63 #include <sys/arc.h>
  64 #include <sys/ddt.h>
  65 #include <sys/zfeature.h>
  66 #include <sys/abd.h>
  67 #include <sys/blkptr.h>
  68 #include <zfs_comutil.h>
  69 #include <libcmdutils.h>
  70 #undef verify
  71 #include <libzfs.h>
  72 
  73 #include "zdb.h"
  74 
  75 #define ZDB_COMPRESS_NAME(idx) ((idx) < ZIO_COMPRESS_FUNCTIONS ?     \
  76         zio_compress_table[(idx)].ci_name : "UNKNOWN")
  77 #define ZDB_CHECKSUM_NAME(idx) ((idx) < ZIO_CHECKSUM_FUNCTIONS ?     \
  78         zio_checksum_table[(idx)].ci_name : "UNKNOWN")
  79 #define ZDB_OT_NAME(idx) ((idx) < DMU_OT_NUMTYPES ?  \
  80         dmu_ot[(idx)].ot_name : DMU_OT_IS_VALID(idx) ?  \
  81         dmu_ot_byteswap[DMU_OT_BYTESWAP(idx)].ob_name : "UNKNOWN")
  82 #define ZDB_OT_TYPE(idx) ((idx) < DMU_OT_NUMTYPES ? (idx) :          \
  83         (((idx) == DMU_OTN_ZAP_DATA || (idx) == DMU_OTN_ZAP_METADATA) ? \
  84         DMU_OT_ZAP_OTHER : DMU_OT_NUMTYPES))


  85 
  86 #ifndef lint
  87 extern int reference_tracking_enable;
  88 extern boolean_t zfs_recover;
  89 extern uint64_t zfs_arc_max, zfs_arc_meta_limit;
  90 extern int zfs_vdev_async_read_max_active;
  91 extern int aok;

  92 #else
  93 int reference_tracking_enable;
  94 boolean_t zfs_recover;
  95 uint64_t zfs_arc_max, zfs_arc_meta_limit;
  96 int zfs_vdev_async_read_max_active;
  97 int aok;

  98 #endif
  99 
 100 static const char cmdname[] = "zdb";
 101 uint8_t dump_opt[256];
 102 
 103 typedef void object_viewer_t(objset_t *, uint64_t, void *data, size_t size);
 104 
 105 uint64_t *zopt_object = NULL;
 106 static unsigned zopt_objects = 0;
 107 libzfs_handle_t *g_zfs;
 108 uint64_t max_inflight = 1000;
 109 
 110 static void snprintf_blkptr_compact(char *, size_t, const blkptr_t *);
 111 
 112 /*
 113  * These libumem hooks provide a reasonable set of defaults for the allocator's
 114  * debugging facilities.
 115  */
 116 const char *
 117 _umem_debug_init()


 655 
 656         if (vd->vdev_ops->vdev_op_leaf) {
 657                 space_map_t *sm = vd->vdev_dtl_sm;
 658 
 659                 if (sm != NULL &&
 660                     sm->sm_dbuf->db_size == sizeof (space_map_phys_t))
 661                         return (1);
 662                 return (0);
 663         }
 664 
 665         for (unsigned c = 0; c < vd->vdev_children; c++)
 666                 refcount += get_dtl_refcount(vd->vdev_child[c]);
 667         return (refcount);
 668 }
 669 
 670 static int
 671 get_metaslab_refcount(vdev_t *vd)
 672 {
 673         int refcount = 0;
 674 
 675         if (vd->vdev_top == vd && !vd->vdev_removing) {
 676                 for (unsigned m = 0; m < vd->vdev_ms_count; m++) {
 677                         space_map_t *sm = vd->vdev_ms[m]->ms_sm;
 678 
 679                         if (sm != NULL &&
 680                             sm->sm_dbuf->db_size == sizeof (space_map_phys_t))
 681                                 refcount++;
 682                 }
 683         }
 684         for (unsigned c = 0; c < vd->vdev_children; c++)
 685                 refcount += get_metaslab_refcount(vd->vdev_child[c]);
 686 
 687         return (refcount);
 688 }
 689 
 690 static int







































 691 verify_spacemap_refcounts(spa_t *spa)
 692 {
 693         uint64_t expected_refcount = 0;
 694         uint64_t actual_refcount;
 695 
 696         (void) feature_get_refcount(spa,
 697             &spa_feature_table[SPA_FEATURE_SPACEMAP_HISTOGRAM],
 698             &expected_refcount);
 699         actual_refcount = get_dtl_refcount(spa->spa_root_vdev);
 700         actual_refcount += get_metaslab_refcount(spa->spa_root_vdev);


 701 
 702         if (expected_refcount != actual_refcount) {
 703                 (void) printf("space map refcount mismatch: expected %lld != "
 704                     "actual %lld\n",
 705                     (longlong_t)expected_refcount,
 706                     (longlong_t)actual_refcount);
 707                 return (2);
 708         }
 709         return (0);
 710 }
 711 
 712 static void
 713 dump_spacemap(objset_t *os, space_map_t *sm)
 714 {
 715         uint64_t alloc, offset, entry;
 716         const char *ddata[] = { "ALLOC", "FREE", "CONDENSE", "INVALID",
 717                             "INVALID", "INVALID", "INVALID", "INVALID" };
 718 
 719         if (sm == NULL)
 720                 return;
 721 







 722         /*
 723          * Print out the freelist entries in both encoded and decoded form.
 724          */
 725         alloc = 0;
 726         for (offset = 0; offset < space_map_length(sm);
 727             offset += sizeof (entry)) {
 728                 uint8_t mapshift = sm->sm_shift;
 729 
 730                 VERIFY0(dmu_read(os, space_map_object(sm), offset,
 731                     sizeof (entry), &entry, DMU_READ_PREFETCH));
 732                 if (SM_DEBUG_DECODE(entry)) {
 733 
 734                         (void) printf("\t    [%6llu] %s: txg %llu, pass %llu\n",
 735                             (u_longlong_t)(offset / sizeof (entry)),
 736                             ddata[SM_DEBUG_ACTION_DECODE(entry)],
 737                             (u_longlong_t)SM_DEBUG_TXG_DECODE(entry),
 738                             (u_longlong_t)SM_DEBUG_SYNCPASS_DECODE(entry));
 739                 } else {
 740                         (void) printf("\t    [%6llu]    %c  range:"
 741                             " %010llx-%010llx  size: %06llx\n",


 806                 dump_metaslab_stats(msp);
 807                 metaslab_unload(msp);
 808                 mutex_exit(&msp->ms_lock);
 809         }
 810 
 811         if (dump_opt['m'] > 1 && sm != NULL &&
 812             spa_feature_is_active(spa, SPA_FEATURE_SPACEMAP_HISTOGRAM)) {
 813                 /*
 814                  * The space map histogram represents free space in chunks
 815                  * of sm_shift (i.e. bucket 0 refers to 2^sm_shift).
 816                  */
 817                 (void) printf("\tOn-disk histogram:\t\tfragmentation %llu\n",
 818                     (u_longlong_t)msp->ms_fragmentation);
 819                 dump_histogram(sm->sm_phys->smp_histogram,
 820                     SPACE_MAP_HISTOGRAM_SIZE, sm->sm_shift);
 821         }
 822 
 823         if (dump_opt['d'] > 5 || dump_opt['m'] > 3) {
 824                 ASSERT(msp->ms_size == (1ULL << vd->vdev_ms_shift));
 825 
 826                 mutex_enter(&msp->ms_lock);
 827                 dump_spacemap(spa->spa_meta_objset, msp->ms_sm);
 828                 mutex_exit(&msp->ms_lock);
 829         }
 830 }
 831 
 832 static void
 833 print_vdev_metaslab_header(vdev_t *vd)
 834 {
 835         (void) printf("\tvdev %10llu\n\t%-10s%5llu   %-19s   %-15s   %-10s\n",
 836             (u_longlong_t)vd->vdev_id,
 837             "metaslabs", (u_longlong_t)vd->vdev_ms_count,
 838             "offset", "spacemap", "free");
 839         (void) printf("\t%15s   %19s   %15s   %10s\n",
 840             "---------------", "-------------------",
 841             "---------------", "-------------");
 842 }
 843 
 844 static void
 845 dump_metaslab_groups(spa_t *spa)
 846 {
 847         vdev_t *rvd = spa->spa_root_vdev;
 848         metaslab_class_t *mc = spa_normal_class(spa);


 866                     (u_longlong_t)tvd->vdev_ms_count);
 867                 if (mg->mg_fragmentation == ZFS_FRAG_INVALID) {
 868                         (void) printf("%3s\n", "-");
 869                 } else {
 870                         (void) printf("%3llu%%\n",
 871                             (u_longlong_t)mg->mg_fragmentation);
 872                 }
 873                 dump_histogram(mg->mg_histogram, RANGE_TREE_HISTOGRAM_SIZE, 0);
 874         }
 875 
 876         (void) printf("\tpool %s\tfragmentation", spa_name(spa));
 877         fragmentation = metaslab_class_fragmentation(mc);
 878         if (fragmentation == ZFS_FRAG_INVALID)
 879                 (void) printf("\t%3s\n", "-");
 880         else
 881                 (void) printf("\t%3llu%%\n", (u_longlong_t)fragmentation);
 882         dump_histogram(mc->mc_histogram, RANGE_TREE_HISTOGRAM_SIZE, 0);
 883 }
 884 
 885 static void








































































 886 dump_metaslabs(spa_t *spa)
 887 {
 888         vdev_t *vd, *rvd = spa->spa_root_vdev;
 889         uint64_t m, c = 0, children = rvd->vdev_children;
 890 
 891         (void) printf("\nMetaslabs:\n");
 892 
 893         if (!dump_opt['d'] && zopt_objects > 0) {
 894                 c = zopt_object[0];
 895 
 896                 if (c >= children)
 897                         (void) fatal("bad vdev id: %llu", (u_longlong_t)c);
 898 
 899                 if (zopt_objects > 1) {
 900                         vd = rvd->vdev_child[c];
 901                         print_vdev_metaslab_header(vd);
 902 
 903                         for (m = 1; m < zopt_objects; m++) {
 904                                 if (zopt_object[m] < vd->vdev_ms_count)
 905                                         dump_metaslab(
 906                                             vd->vdev_ms[zopt_object[m]]);
 907                                 else
 908                                         (void) fprintf(stderr, "bad metaslab "
 909                                             "number %llu\n",
 910                                             (u_longlong_t)zopt_object[m]);
 911                         }
 912                         (void) printf("\n");
 913                         return;
 914                 }
 915                 children = c + 1;
 916         }
 917         for (; c < children; c++) {
 918                 vd = rvd->vdev_child[c];
 919                 print_vdev_metaslab_header(vd);
 920 


 921                 for (m = 0; m < vd->vdev_ms_count; m++)
 922                         dump_metaslab(vd->vdev_ms[m]);
 923                 (void) printf("\n");
 924         }
 925 }
 926 
 927 static void
 928 dump_dde(const ddt_t *ddt, const ddt_entry_t *dde, uint64_t index)
 929 {
 930         const ddt_phys_t *ddp = dde->dde_phys;
 931         const ddt_key_t *ddk = &dde->dde_key;
 932         const char *types[4] = { "ditto", "single", "double", "triple" };
 933         char blkbuf[BP_SPRINTF_LEN];
 934         blkptr_t blk;
 935 
 936         for (int p = 0; p < DDT_PHYS_TYPES; p++, ddp++) {
 937                 if (ddp->ddp_phys_birth == 0)
 938                         continue;
 939                 ddt_bp_create(ddt->ddt_checksum, ddk, ddp, &blk);
 940                 snprintf_blkptr(blkbuf, sizeof (blkbuf), &blk);


 965             "dedup * compress / copies = %.2f\n\n",
 966             dedup, compress, copies, dedup * compress / copies);
 967 }
 968 
 969 static void
 970 dump_ddt(ddt_t *ddt, enum ddt_type type, enum ddt_class class)
 971 {
 972         char name[DDT_NAMELEN];
 973         ddt_entry_t dde;
 974         uint64_t walk = 0;
 975         dmu_object_info_t doi;
 976         uint64_t count, dspace, mspace;
 977         int error;
 978 
 979         error = ddt_object_info(ddt, type, class, &doi);
 980 
 981         if (error == ENOENT)
 982                 return;
 983         ASSERT(error == 0);
 984 
 985         (void) ddt_object_count(ddt, type, class, &count);
 986         if (count == 0)
 987                 return;
 988 
 989         dspace = doi.doi_physical_blocks_512 << 9;
 990         mspace = doi.doi_fill_count * doi.doi_data_block_size;
 991 
 992         ddt_object_name(ddt, type, class, name);
 993 
 994         (void) printf("%s: %llu entries, size %llu on disk, %llu in core\n",
 995             name,
 996             (u_longlong_t)count,
 997             (u_longlong_t)(dspace / count),
 998             (u_longlong_t)(mspace / count));
 999 
1000         if (dump_opt['D'] < 3)
1001                 return;
1002 
1003         zpool_dump_ddt(NULL, &ddt->ddt_histogram[type][class]);
1004 
1005         if (dump_opt['D'] < 4)
1006                 return;


1077         char prefix[256];
1078 
1079         spa_vdev_state_enter(spa, SCL_NONE);
1080         required = vdev_dtl_required(vd);
1081         (void) spa_vdev_state_exit(spa, NULL, 0);
1082 
1083         if (indent == 0)
1084                 (void) printf("\nDirty time logs:\n\n");
1085 
1086         (void) printf("\t%*s%s [%s]\n", indent, "",
1087             vd->vdev_path ? vd->vdev_path :
1088             vd->vdev_parent ? vd->vdev_ops->vdev_op_type : spa_name(spa),
1089             required ? "DTL-required" : "DTL-expendable");
1090 
1091         for (int t = 0; t < DTL_TYPES; t++) {
1092                 range_tree_t *rt = vd->vdev_dtl[t];
1093                 if (range_tree_space(rt) == 0)
1094                         continue;
1095                 (void) snprintf(prefix, sizeof (prefix), "\t%*s%s",
1096                     indent + 2, "", name[t]);
1097                 mutex_enter(rt->rt_lock);
1098                 range_tree_walk(rt, dump_dtl_seg, prefix);
1099                 mutex_exit(rt->rt_lock);
1100                 if (dump_opt['d'] > 5 && vd->vdev_children == 0)
1101                         dump_spacemap(spa->spa_meta_objset, vd->vdev_dtl_sm);
1102         }
1103 
1104         for (unsigned c = 0; c < vd->vdev_children; c++)
1105                 dump_dtl(vd->vdev_child[c], indent + 4);
1106 }
1107 
1108 static void
1109 dump_history(spa_t *spa)
1110 {
1111         nvlist_t **events = NULL;
1112         uint64_t resid, len, off = 0;
1113         uint64_t buflen;
1114         uint_t num = 0;
1115         int error;
1116         time_t tsec;
1117         struct tm t;
1118         char tbuf[30];
1119         char internalstr[MAXPATHLEN];
1120 
1121         buflen = SPA_MAXBLOCKSIZE;
1122         char *buf = umem_alloc(buflen, UMEM_NOFAIL);
1123         do {
1124                 len = buflen;
1125 
1126                 if ((error = spa_history_get(spa, &off, &len, buf)) != 0) {
1127                         break;



1128                 }
1129 
1130                 error = zpool_history_unpack(buf, len, &resid, &events, &num);
1131                 if (error != 0) {
1132                         break;
1133                 }
1134 
1135                 off -= resid;
1136                 if (resid == len) {
1137                          umem_free(buf, buflen);
1138                          buflen *= 2;
1139                          buf = umem_alloc(buflen, UMEM_NOFAIL);
1140                          if (buf == NULL) {
1141                                 (void) fprintf(stderr, "Unable to read history: %s\n",
1142                                     strerror(error));
1143                                 goto err;
1144                          }
1145                 }
1146         } while (len != 0);
1147         umem_free(buf, buflen);
1148 
1149         if (error != 0) {
1150                 (void) fprintf(stderr, "Unable to read history: %s\n",
1151                     strerror(error));
1152                 goto err;
1153         }
1154 
1155         (void) printf("\nHistory:\n");
1156         for (unsigned i = 0; i < num; i++) {
1157                 uint64_t time, txg, ievent;
1158                 char *cmd, *intstr;
1159                 boolean_t printed = B_FALSE;
1160 
1161                 if (nvlist_lookup_uint64(events[i], ZPOOL_HIST_TIME,
1162                     &time) != 0)
1163                         goto next;
1164                 if (nvlist_lookup_string(events[i], ZPOOL_HIST_CMD,
1165                     &cmd) != 0) {
1166                         if (nvlist_lookup_uint64(events[i],
1167                             ZPOOL_HIST_INT_EVENT, &ievent) != 0)
1168                                 goto next;
1169                         verify(nvlist_lookup_uint64(events[i],
1170                             ZPOOL_HIST_TXG, &txg) == 0);
1171                         verify(nvlist_lookup_string(events[i],
1172                             ZPOOL_HIST_INT_STR, &intstr) == 0);
1173                         if (ievent >= ZFS_NUM_LEGACY_HISTORY_EVENTS)
1174                                 goto next;


1176                         (void) snprintf(internalstr,
1177                             sizeof (internalstr),
1178                             "[internal %s txg:%ju] %s",
1179                             zfs_history_event_names[ievent], (uintmax_t)txg,
1180                             intstr);
1181                         cmd = internalstr;
1182                 }
1183                 tsec = time;
1184                 (void) localtime_r(&tsec, &t);
1185                 (void) strftime(tbuf, sizeof (tbuf), "%F.%T", &t);
1186                 (void) printf("%s %s\n", tbuf, cmd);
1187                 printed = B_TRUE;
1188 
1189 next:
1190                 if (dump_opt['h'] > 1) {
1191                         if (!printed)
1192                                 (void) printf("unrecognized record:\n");
1193                         dump_nvlist(events[i], 2);
1194                 }
1195         }
1196 err:
1197         for (unsigned i = 0; i < num; i++) {
1198                 nvlist_free(events[i]);
1199         }
1200         free(events);
1201 }
1202 
1203 /*ARGSUSED*/
1204 static void
1205 dump_dnode(objset_t *os, uint64_t object, void *data, size_t size)
1206 {
1207 }
1208 
1209 static uint64_t
1210 blkid2offset(const dnode_phys_t *dnp, const blkptr_t *bp,
1211     const zbookmark_phys_t *zb)
1212 {
1213         if (dnp == NULL) {
1214                 ASSERT(zb->zb_level < 0);
1215                 if (zb->zb_object == 0)
1216                         return (zb->zb_blkid);
1217                 return (zb->zb_blkid * BP_GET_LSIZE(bp));
1218         }
1219 
1220         ASSERT(zb->zb_level >= 0);


2095 
2096         dmu_objset_name(os, osname);
2097 
2098         (void) printf("Dataset %s [%s], ID %llu, cr_txg %llu, "
2099             "%s, %llu objects%s\n",
2100             osname, type, (u_longlong_t)dmu_objset_id(os),
2101             (u_longlong_t)dds.dds_creation_txg,
2102             numbuf, (u_longlong_t)usedobjs, blkbuf);
2103 
2104         if (zopt_objects != 0) {
2105                 for (i = 0; i < zopt_objects; i++)
2106                         dump_object(os, zopt_object[i], verbosity,
2107                             &print_header);
2108                 (void) printf("\n");
2109                 return;
2110         }
2111 
2112         if (dump_opt['i'] != 0 || verbosity >= 2)
2113                 dump_intent_log(dmu_objset_zil(os));
2114 
2115         if (dmu_objset_ds(os) != NULL)
2116                 dump_deadlist(&dmu_objset_ds(os)->ds_deadlist);

2117 






2118         if (verbosity < 2)
2119                 return;
2120 
2121         if (BP_IS_HOLE(os->os_rootbp))
2122                 return;
2123 
2124         dump_object(os, 0, verbosity, &print_header);
2125         object_count = 0;
2126         if (DMU_USERUSED_DNODE(os) != NULL &&
2127             DMU_USERUSED_DNODE(os)->dn_type != 0) {
2128                 dump_object(os, DMU_USERUSED_OBJECT, verbosity, &print_header);
2129                 dump_object(os, DMU_GROUPUSED_OBJECT, verbosity, &print_header);
2130         }
2131 
2132         object = 0;
2133         while ((error = dmu_object_next(os, &object, B_FALSE, 0)) == 0) {
2134                 dump_object(os, object, verbosity, &print_header);
2135                 object_count++;
2136         }
2137 


2440                         if (!dump_opt['q'])
2441                                 dump_nvlist(config, 4);
2442                         if ((nvlist_lookup_nvlist(config,
2443                             ZPOOL_CONFIG_VDEV_TREE, &vdev_tree) != 0) ||
2444                             (nvlist_lookup_uint64(vdev_tree,
2445                             ZPOOL_CONFIG_ASHIFT, &ashift) != 0))
2446                                 ashift = SPA_MINBLOCKSHIFT;
2447                         nvlist_free(config);
2448                         label_found = B_TRUE;
2449                 }
2450                 if (dump_opt['u'])
2451                         dump_label_uberblocks(&label, ashift);
2452         }
2453 
2454         (void) close(fd);
2455 
2456         return (label_found ? 0 : 2);
2457 }
2458 
2459 static uint64_t dataset_feature_count[SPA_FEATURES];

2460 
2461 /*ARGSUSED*/
2462 static int
2463 dump_one_dir(const char *dsname, void *arg)
2464 {
2465         int error;
2466         objset_t *os;
2467 
2468         error = open_objset(dsname, DMU_OST_ANY, FTAG, &os);
2469         if (error != 0)
2470                 return (0);
2471 
2472         for (spa_feature_t f = 0; f < SPA_FEATURES; f++) {
2473                 if (!dmu_objset_ds(os)->ds_feature_inuse[f])
2474                         continue;
2475                 ASSERT(spa_feature_table[f].fi_flags &
2476                     ZFEATURE_FLAG_PER_DATASET);
2477                 dataset_feature_count[f]++;
2478         }
2479 




2480         dump_dir(os);
2481         close_objset(os, FTAG);
2482         fuid_table_destroy();
2483         return (0);
2484 }
2485 
2486 /*
2487  * Block statistics.
2488  */
2489 #define PSIZE_HISTO_SIZE (SPA_OLD_MAXBLOCKSIZE / SPA_MINBLOCKSIZE + 2)
2490 typedef struct zdb_blkstats {
2491         uint64_t zb_asize;
2492         uint64_t zb_lsize;
2493         uint64_t zb_psize;
2494         uint64_t zb_count;
2495         uint64_t zb_gangs;
2496         uint64_t zb_ditto_samevdev;
2497         uint64_t zb_psize_histogram[PSIZE_HISTO_SIZE];
2498 } zdb_blkstats_t;
2499 
2500 /*
2501  * Extended object types to report deferred frees and dedup auto-ditto blocks.
2502  */
2503 #define ZDB_OT_DEFERRED (DMU_OT_NUMTYPES + 0)
2504 #define ZDB_OT_DITTO    (DMU_OT_NUMTYPES + 1)
2505 #define ZDB_OT_OTHER    (DMU_OT_NUMTYPES + 2)
2506 #define ZDB_OT_TOTAL    (DMU_OT_NUMTYPES + 3)
2507 
2508 static const char *zdb_ot_extname[] = {
2509         "deferred free",
2510         "dedup ditto",
2511         "other",
2512         "Total",
2513 };
2514 
2515 #define ZB_TOTAL        DN_MAX_LEVELS
2516 
2517 typedef struct zdb_cb {
2518         zdb_blkstats_t  zcb_type[ZB_TOTAL + 1][ZDB_OT_TOTAL + 1];

2519         uint64_t        zcb_dedup_asize;
2520         uint64_t        zcb_dedup_blocks;
2521         uint64_t        zcb_embedded_blocks[NUM_BP_EMBEDDED_TYPES];
2522         uint64_t        zcb_embedded_histogram[NUM_BP_EMBEDDED_TYPES]
2523             [BPE_PAYLOAD_SIZE];
2524         uint64_t        zcb_start;
2525         hrtime_t        zcb_lastprint;
2526         uint64_t        zcb_totalasize;
2527         uint64_t        zcb_errors[256];
2528         int             zcb_readfails;
2529         int             zcb_haderrors;
2530         spa_t           *zcb_spa;

2531 } zdb_cb_t;
2532 
2533 static void
2534 zdb_count_block(zdb_cb_t *zcb, zilog_t *zilog, const blkptr_t *bp,
2535     dmu_object_type_t type)
2536 {
2537         uint64_t refcnt = 0;
2538 
2539         ASSERT(type < ZDB_OT_TOTAL);
2540 
2541         if (zilog && zil_bp_tree_add(zilog, bp) != 0)
2542                 return;
2543 
2544         for (int i = 0; i < 4; i++) {
2545                 int l = (i < 2) ? BP_GET_LEVEL(bp) : ZB_TOTAL;
2546                 int t = (i & 1) ? type : ZDB_OT_TOTAL;
2547                 int equal;
2548                 zdb_blkstats_t *zb = &zcb->zcb_type[l][t];
2549 
2550                 zb->zb_asize += BP_GET_ASIZE(bp);


2581                         break;
2582                 }
2583 
2584         }
2585 
2586         if (BP_IS_EMBEDDED(bp)) {
2587                 zcb->zcb_embedded_blocks[BPE_GET_ETYPE(bp)]++;
2588                 zcb->zcb_embedded_histogram[BPE_GET_ETYPE(bp)]
2589                     [BPE_GET_PSIZE(bp)]++;
2590                 return;
2591         }
2592 
2593         if (dump_opt['L'])
2594                 return;
2595 
2596         if (BP_GET_DEDUP(bp)) {
2597                 ddt_t *ddt;
2598                 ddt_entry_t *dde;
2599 
2600                 ddt = ddt_select(zcb->zcb_spa, bp);

2601                 dde = ddt_lookup(ddt, bp, B_FALSE);
2602 
2603                 if (dde == NULL) {
2604                         refcnt = 0;
2605                 } else {
2606                         ddt_phys_t *ddp = ddt_phys_select(dde, bp);
2607 
2608                         /* no other competitors for dde */
2609                         dde_exit(dde);
2610 
2611                         ddt_phys_decref(ddp);
2612                         refcnt = ddp->ddp_refcnt;
2613                         if (ddt_phys_total_refcnt(dde) == 0)
2614                                 ddt_remove(ddt, dde);
2615                 }

2616         }
2617 
2618         VERIFY3U(zio_wait(zio_claim(NULL, zcb->zcb_spa,
2619             refcnt ? 0 : spa_first_txg(zcb->zcb_spa),
2620             bp, NULL, NULL, ZIO_FLAG_CANFAIL)), ==, 0);
2621 }
2622 
2623 static void
2624 zdb_blkptr_done(zio_t *zio)
2625 {
2626         spa_t *spa = zio->io_spa;
2627         blkptr_t *bp = zio->io_bp;
2628         int ioerr = zio->io_error;
2629         zdb_cb_t *zcb = zio->io_private;
2630         zbookmark_phys_t *zb = &zio->io_bookmark;
2631 
2632         abd_free(zio->io_abd);
2633 
2634         mutex_enter(&spa->spa_scrub_lock);
2635         spa->spa_scrub_inflight--;


2673         if (dump_opt['b'] >= 5 && bp->blk_birth > 0) {
2674                 char blkbuf[BP_SPRINTF_LEN];
2675                 snprintf_blkptr(blkbuf, sizeof (blkbuf), bp);
2676                 (void) printf("objset %llu object %llu "
2677                     "level %lld offset 0x%llx %s\n",
2678                     (u_longlong_t)zb->zb_objset,
2679                     (u_longlong_t)zb->zb_object,
2680                     (longlong_t)zb->zb_level,
2681                     (u_longlong_t)blkid2offset(dnp, bp, zb),
2682                     blkbuf);
2683         }
2684 
2685         if (BP_IS_HOLE(bp))
2686                 return (0);
2687 
2688         type = BP_GET_TYPE(bp);
2689 
2690         zdb_count_block(zcb, zilog, bp,
2691             (type & DMU_OT_NEWTYPE) ? ZDB_OT_OTHER : type);
2692 
2693         is_metadata = BP_IS_METADATA(bp);
2694 
2695         if (!BP_IS_EMBEDDED(bp) &&
2696             (dump_opt['c'] > 1 || (dump_opt['c'] && is_metadata))) {
2697                 size_t size = BP_GET_PSIZE(bp);
2698                 abd_t *abd = abd_alloc(size, B_FALSE);
2699                 int flags = ZIO_FLAG_CANFAIL | ZIO_FLAG_SCRUB | ZIO_FLAG_RAW;
2700 
2701                 /* If it's an intent log block, failure is expected. */
2702                 if (zb->zb_level == ZB_ZIL_LEVEL)
2703                         flags |= ZIO_FLAG_SPECULATIVE;
2704 
2705                 mutex_enter(&spa->spa_scrub_lock);
2706                 while (spa->spa_scrub_inflight > max_inflight)
2707                         cv_wait(&spa->spa_scrub_io_cv, &spa->spa_scrub_lock);
2708                 spa->spa_scrub_inflight++;
2709                 mutex_exit(&spa->spa_scrub_lock);
2710 
2711                 zio_nowait(zio_read(NULL, spa, bp, abd, size,
2712                     zdb_blkptr_done, zcb, ZIO_PRIORITY_ASYNC_READ, flags, zb));
2713         }


2776                 if (ddb.ddb_class == DDT_CLASS_UNIQUE)
2777                         return;
2778 
2779                 ASSERT(ddt_phys_total_refcnt(&dde) > 1);
2780 
2781                 for (int p = 0; p < DDT_PHYS_TYPES; p++, ddp++) {
2782                         if (ddp->ddp_phys_birth == 0)
2783                                 continue;
2784                         ddt_bp_create(ddb.ddb_checksum,
2785                             &dde.dde_key, ddp, &blk);
2786                         if (p == DDT_PHYS_DITTO) {
2787                                 zdb_count_block(zcb, NULL, &blk, ZDB_OT_DITTO);
2788                         } else {
2789                                 zcb->zcb_dedup_asize +=
2790                                     BP_GET_ASIZE(&blk) * (ddp->ddp_refcnt - 1);
2791                                 zcb->zcb_dedup_blocks++;
2792                         }
2793                 }
2794                 if (!dump_opt['L']) {
2795                         ddt_t *ddt = spa->spa_ddt[ddb.ddb_checksum];
2796                         ddt_entry_t *dde;
2797                         VERIFY((dde = ddt_lookup(ddt, &blk, B_TRUE)) != NULL);
2798                         dde_exit(dde);
2799                 }
2800         }
2801 
2802         ASSERT(error == ENOENT);
2803 }
2804 

2805 static void


































































































































































































2806 zdb_leak_init(spa_t *spa, zdb_cb_t *zcb)
2807 {
2808         zcb->zcb_spa = spa;
2809 
2810         if (!dump_opt['L']) {

2811                 vdev_t *rvd = spa->spa_root_vdev;
2812 
2813                 /*
2814                  * We are going to be changing the meaning of the metaslab's
2815                  * ms_tree.  Ensure that the allocator doesn't try to
2816                  * use the tree.
2817                  */
2818                 spa->spa_normal_class->mc_ops = &zdb_metaslab_ops;
2819                 spa->spa_log_class->mc_ops = &zdb_metaslab_ops;
2820 





2821                 for (uint64_t c = 0; c < rvd->vdev_children; c++) {
2822                         vdev_t *vd = rvd->vdev_child[c];
2823                         metaslab_group_t *mg = vd->vdev_mg;
2824                         for (uint64_t m = 0; m < vd->vdev_ms_count; m++) {
2825                                 metaslab_t *msp = vd->vdev_ms[m];
2826                                 ASSERT3P(msp->ms_group, ==, mg);
2827                                 mutex_enter(&msp->ms_lock);
2828                                 metaslab_unload(msp);
2829 


2830                                 /*
2831                                  * For leak detection, we overload the metaslab
2832                                  * ms_tree to contain allocated segments
2833                                  * instead of free segments. As a result,
2834                                  * we can't use the normal metaslab_load/unload
2835                                  * interfaces.
2836                                  */
2837                                 if (msp->ms_sm != NULL) {
2838                                         (void) fprintf(stderr,
2839                                             "\rloading space map for "
2840                                             "vdev %llu of %llu, "
2841                                             "metaslab %llu of %llu ...",
2842                                             (longlong_t)c,
2843                                             (longlong_t)rvd->vdev_children,
2844                                             (longlong_t)m,
2845                                             (longlong_t)vd->vdev_ms_count);
2846 
2847                                         /*
2848                                          * We don't want to spend the CPU
2849                                          * manipulating the size-ordered
2850                                          * tree, so clear the range_tree
2851                                          * ops.
2852                                          */
2853                                         msp->ms_tree->rt_ops = NULL;
2854                                         VERIFY0(space_map_load(msp->ms_sm,
2855                                             msp->ms_tree, SM_ALLOC));
2856 
2857                                         if (!msp->ms_loaded) {
2858                                                 msp->ms_loaded = B_TRUE;
2859                                         }




2860                                 }
2861                                 mutex_exit(&msp->ms_lock);
2862                         }
2863                 }
2864                 (void) fprintf(stderr, "\n");






2865         }

2866 
2867         spa_config_enter(spa, SCL_CONFIG, FTAG, RW_READER);
2868 
2869         zdb_ddt_leak_init(spa, zcb);
2870 
2871         spa_config_exit(spa, SCL_CONFIG, FTAG);
2872 }
2873 
2874 static void
2875 zdb_leak_fini(spa_t *spa)
2876 {







































































2877         if (!dump_opt['L']) {
2878                 vdev_t *rvd = spa->spa_root_vdev;
2879                 for (unsigned c = 0; c < rvd->vdev_children; c++) {
2880                         vdev_t *vd = rvd->vdev_child[c];
2881                         metaslab_group_t *mg = vd->vdev_mg;
2882                         for (unsigned m = 0; m < vd->vdev_ms_count; m++) {





2883                                 metaslab_t *msp = vd->vdev_ms[m];
2884                                 ASSERT3P(mg, ==, msp->ms_group);
2885                                 mutex_enter(&msp->ms_lock);
2886 
2887                                 /*
2888                                  * The ms_tree has been overloaded to
2889                                  * contain allocated segments. Now that we
2890                                  * finished traversing all blocks, any
2891                                  * block that remains in the ms_tree
2892                                  * represents an allocated block that we
2893                                  * did not claim during the traversal.
2894                                  * Claimed blocks would have been removed
2895                                  * from the ms_tree.



2896                                  */
2897                                 range_tree_vacate(msp->ms_tree, zdb_leak, vd);






2898 
2899                                 if (msp->ms_loaded) {
2900                                         msp->ms_loaded = B_FALSE;
2901                                 }
2902 
2903                                 mutex_exit(&msp->ms_lock);
2904                         }
2905                 }




2906         }

2907 }
2908 
2909 /* ARGSUSED */
2910 static int
2911 count_block_cb(void *arg, const blkptr_t *bp, dmu_tx_t *tx)
2912 {
2913         zdb_cb_t *zcb = arg;
2914 
2915         if (dump_opt['b'] >= 5) {
2916                 char blkbuf[BP_SPRINTF_LEN];
2917                 snprintf_blkptr(blkbuf, sizeof (blkbuf), bp);
2918                 (void) printf("[%s] %s\n",
2919                     "deferred free", blkbuf);
2920         }
2921         zdb_count_block(zcb, NULL, bp, ZDB_OT_DEFERRED);
2922         return (0);
2923 }
2924 
2925 static int
2926 dump_block_stats(spa_t *spa)
2927 {
2928         zdb_cb_t zcb;
2929         zdb_blkstats_t *zb, *tzb;
2930         uint64_t norm_alloc, spec_alloc, norm_space, total_alloc, total_found;
2931         int flags = TRAVERSE_PRE | TRAVERSE_PREFETCH_METADATA | TRAVERSE_HARD;
2932         boolean_t leaks = B_FALSE;
2933 
2934         bzero(&zcb, sizeof (zcb));
2935         (void) printf("\nTraversing all blocks %s%s%s%s%s...\n\n",
2936             (dump_opt['c'] || !dump_opt['L']) ? "to verify " : "",
2937             (dump_opt['c'] == 1) ? "metadata " : "",
2938             dump_opt['c'] ? "checksums " : "",
2939             (dump_opt['c'] && !dump_opt['L']) ? "and verify " : "",
2940             !dump_opt['L'] ? "nothing leaked " : "");
2941 
2942         /*
2943          * Load all space maps as SM_ALLOC maps, then traverse the pool
2944          * claiming each block we discover.  If the pool is perfectly
2945          * consistent, the space maps will be empty when we're done.
2946          * Anything left over is a leak; any block we can't claim (because
2947          * it's not part of any space map) is a double allocation,
2948          * reference to a freed block, or an unclaimed log block.
2949          */
2950         zdb_leak_init(spa, &zcb);
2951 
2952         /*
2953          * If there's a deferred-free bplist, process that first.
2954          */
2955         (void) bpobj_iterate_nofree(&spa->spa_deferred_bpobj,
2956             count_block_cb, &zcb, NULL);

2957         if (spa_version(spa) >= SPA_VERSION_DEADLISTS) {
2958                 (void) bpobj_iterate_nofree(&spa->spa_dsl_pool->dp_free_bpobj,
2959                     count_block_cb, &zcb, NULL);
2960         }



2961         if (spa_feature_is_active(spa, SPA_FEATURE_ASYNC_DESTROY)) {
2962                 VERIFY3U(0, ==, bptree_iterate(spa->spa_meta_objset,
2963                     spa->spa_dsl_pool->dp_bptree_obj, B_FALSE, count_block_cb,
2964                     &zcb, NULL));
2965         }
2966 
2967         if (dump_opt['c'] > 1)
2968                 flags |= TRAVERSE_PREFETCH_DATA;
2969 
2970         zcb.zcb_totalasize = metaslab_class_get_alloc(spa_normal_class(spa));
2971         zcb.zcb_start = zcb.zcb_lastprint = gethrtime();
2972         zcb.zcb_haderrors |= traverse_pool(spa, 0, UINT64_MAX,
2973             flags, zdb_blkptr_cb, &zcb, NULL);
2974 
2975         /*
2976          * If we've traversed the data blocks then we need to wait for those
2977          * I/Os to complete. We leverage "The Godfather" zio to wait on
2978          * all async I/Os to complete.
2979          */
2980         if (dump_opt['c']) {
2981                 for (int i = 0; i < max_ncpus; i++) {
2982                         (void) zio_wait(spa->spa_async_zio_root[i]);
2983                         spa->spa_async_zio_root[i] = zio_root(spa, NULL, NULL,
2984                             ZIO_FLAG_CANFAIL | ZIO_FLAG_SPECULATIVE |
2985                             ZIO_FLAG_GODFATHER);
2986                 }
2987         }
2988 
2989         if (zcb.zcb_haderrors) {
2990                 (void) printf("\nError counts:\n\n");
2991                 (void) printf("\t%5s  %s\n", "errno", "count");
2992                 for (int e = 0; e < 256; e++) {
2993                         if (zcb.zcb_errors[e] != 0) {
2994                                 (void) printf("\t%5d  %llu\n",
2995                                     e, (u_longlong_t)zcb.zcb_errors[e]);
2996                         }
2997                 }
2998         }
2999 
3000         /*
3001          * Report any leaked segments.
3002          */
3003         zdb_leak_fini(spa);
3004 
3005         tzb = &zcb.zcb_type[ZB_TOTAL][ZDB_OT_TOTAL];
3006 
3007         norm_alloc = metaslab_class_get_alloc(spa_normal_class(spa));
3008         spec_alloc = metaslab_class_get_alloc(spa_special_class(spa));
3009         norm_space = metaslab_class_get_space(spa_normal_class(spa));
3010 
3011         norm_alloc += spec_alloc;
3012         total_alloc = norm_alloc + metaslab_class_get_alloc(spa_log_class(spa));
3013         total_found = tzb->zb_asize - zcb.zcb_dedup_asize;

3014 
3015         if (total_found == total_alloc) {
3016                 if (!dump_opt['L'])
3017                         (void) printf("\n\tNo leaks (block sum matches space"
3018                             " maps exactly)\n");
3019         } else {
3020                 (void) printf("block traversal size %llu != alloc %llu "
3021                     "(%s %lld)\n",
3022                     (u_longlong_t)total_found,
3023                     (u_longlong_t)total_alloc,
3024                     (dump_opt['L']) ? "unreachable" : "leaked",
3025                     (longlong_t)(total_alloc - total_found));
3026                 leaks = B_TRUE;
3027         }
3028 
3029         if (tzb->zb_count == 0)
3030                 return (2);
3031 
3032         (void) printf("\n");
3033         (void) printf("\tbp count:      %10llu\n",


3035         (void) printf("\tganged count:  %10llu\n",
3036             (longlong_t)tzb->zb_gangs);
3037         (void) printf("\tbp logical:    %10llu      avg: %6llu\n",
3038             (u_longlong_t)tzb->zb_lsize,
3039             (u_longlong_t)(tzb->zb_lsize / tzb->zb_count));
3040         (void) printf("\tbp physical:   %10llu      avg:"
3041             " %6llu     compression: %6.2f\n",
3042             (u_longlong_t)tzb->zb_psize,
3043             (u_longlong_t)(tzb->zb_psize / tzb->zb_count),
3044             (double)tzb->zb_lsize / tzb->zb_psize);
3045         (void) printf("\tbp allocated:  %10llu      avg:"
3046             " %6llu     compression: %6.2f\n",
3047             (u_longlong_t)tzb->zb_asize,
3048             (u_longlong_t)(tzb->zb_asize / tzb->zb_count),
3049             (double)tzb->zb_lsize / tzb->zb_asize);
3050         (void) printf("\tbp deduped:    %10llu    ref>1:"
3051             " %6llu   deduplication: %6.2f\n",
3052             (u_longlong_t)zcb.zcb_dedup_asize,
3053             (u_longlong_t)zcb.zcb_dedup_blocks,
3054             (double)zcb.zcb_dedup_asize / tzb->zb_asize + 1.0);
3055         if (spec_alloc != 0) {
3056                 (void) printf("\tspecial allocated: %10llu\n",
3057                     (u_longlong_t)spec_alloc);
3058         }
3059         (void) printf("\tSPA allocated: %10llu     used: %5.2f%%\n",
3060             (u_longlong_t)norm_alloc, 100.0 * norm_alloc / norm_space);
3061 
3062         for (bp_embedded_type_t i = 0; i < NUM_BP_EMBEDDED_TYPES; i++) {
3063                 if (zcb.zcb_embedded_blocks[i] == 0)
3064                         continue;
3065                 (void) printf("\n");
3066                 (void) printf("\tadditional, non-pointer bps of type %u: "
3067                     "%10llu\n",
3068                     i, (u_longlong_t)zcb.zcb_embedded_blocks[i]);
3069 
3070                 if (dump_opt['b'] >= 3) {
3071                         (void) printf("\t number of (compressed) bytes:  "
3072                             "number of bps\n");
3073                         dump_histogram(zcb.zcb_embedded_histogram[i],
3074                             sizeof (zcb.zcb_embedded_histogram[i]) /
3075                             sizeof (zcb.zcb_embedded_histogram[i][0]), 0);
3076                 }
3077         }
3078 
3079         if (tzb->zb_ditto_samevdev != 0) {
3080                 (void) printf("\tDittoed blocks on same vdev: %llu\n",
3081                     (longlong_t)tzb->zb_ditto_samevdev);
3082         }
3083 


















3084         if (dump_opt['b'] >= 2) {
3085                 int l, t, level;
3086                 (void) printf("\nBlocks\tLSIZE\tPSIZE\tASIZE"
3087                     "\t  avg\t comp\t%%Total\tType\n");
3088 
3089                 for (t = 0; t <= ZDB_OT_TOTAL; t++) {
3090                         char csize[32], lsize[32], psize[32], asize[32];
3091                         char avg[32], gang[32];
3092                         const char *typename;
3093 
3094                         /* make sure nicenum has enough space */
3095                         CTASSERT(sizeof (csize) >= NN_NUMBUF_SZ);
3096                         CTASSERT(sizeof (lsize) >= NN_NUMBUF_SZ);
3097                         CTASSERT(sizeof (psize) >= NN_NUMBUF_SZ);
3098                         CTASSERT(sizeof (asize) >= NN_NUMBUF_SZ);
3099                         CTASSERT(sizeof (avg) >= NN_NUMBUF_SZ);
3100                         CTASSERT(sizeof (gang) >= NN_NUMBUF_SZ);
3101 
3102                         if (t < DMU_OT_NUMTYPES)
3103                                 typename = dmu_ot[t].ot_name;


3196 static int
3197 zdb_ddt_add_cb(spa_t *spa, zilog_t *zilog, const blkptr_t *bp,
3198     const zbookmark_phys_t *zb, const dnode_phys_t *dnp, void *arg)
3199 {
3200         avl_tree_t *t = arg;
3201         avl_index_t where;
3202         zdb_ddt_entry_t *zdde, zdde_search;
3203 
3204         if (bp == NULL || BP_IS_HOLE(bp) || BP_IS_EMBEDDED(bp))
3205                 return (0);
3206 
3207         if (dump_opt['S'] > 1 && zb->zb_level == ZB_ROOT_LEVEL) {
3208                 (void) printf("traversing objset %llu, %llu objects, "
3209                     "%lu blocks so far\n",
3210                     (u_longlong_t)zb->zb_objset,
3211                     (u_longlong_t)BP_GET_FILL(bp),
3212                     avl_numnodes(t));
3213         }
3214 
3215         if (BP_IS_HOLE(bp) || BP_GET_CHECKSUM(bp) == ZIO_CHECKSUM_OFF ||
3216             BP_IS_METADATA(bp))
3217                 return (0);
3218 
3219         ddt_key_fill(&zdde_search.zdde_key, bp);
3220 
3221         zdde = avl_find(t, &zdde_search, &where);
3222 
3223         if (zdde == NULL) {
3224                 zdde = umem_zalloc(sizeof (*zdde), UMEM_NOFAIL);
3225                 zdde->zdde_key = zdde_search.zdde_key;
3226                 avl_insert(t, zdde, where);
3227         }
3228 
3229         zdde->zdde_ref_blocks += 1;
3230         zdde->zdde_ref_lsize += BP_GET_LSIZE(bp);
3231         zdde->zdde_ref_psize += BP_GET_PSIZE(bp);
3232         zdde->zdde_ref_dsize += bp_get_dsize_sync(spa, bp);
3233 
3234         return (0);
3235 }
3236 
3237 static void
3238 dump_simulated_ddt(spa_t *spa)
3239 {
3240         avl_tree_t t;
3241         void *cookie = NULL;
3242         zdb_ddt_entry_t *zdde;
3243         ddt_histogram_t ddh_total;
3244         ddt_stat_t dds_total;
3245 
3246         bzero(&ddh_total, sizeof (ddh_total));
3247         bzero(&dds_total, sizeof (dds_total));
3248         avl_create(&t, ddt_entry_compare,
3249             sizeof (zdb_ddt_entry_t), offsetof(zdb_ddt_entry_t, zdde_node));
3250 
3251         spa_config_enter(spa, SCL_CONFIG, FTAG, RW_READER);
3252 
3253         (void) traverse_pool(spa, 0, UINT64_MAX,
3254             TRAVERSE_PRE | TRAVERSE_PREFETCH_METADATA,
3255             zdb_ddt_add_cb, &t, NULL);
3256 
3257         spa_config_exit(spa, SCL_CONFIG, FTAG);
3258 
3259         while ((zdde = avl_destroy_nodes(&t, &cookie)) != NULL) {
3260                 ddt_stat_t dds;
3261                 uint64_t refcnt = zdde->zdde_ref_blocks;
3262                 ASSERT(refcnt != 0);
3263 
3264                 dds.dds_blocks = zdde->zdde_ref_blocks / refcnt;
3265                 dds.dds_lsize = zdde->zdde_ref_lsize / refcnt;
3266                 dds.dds_psize = zdde->zdde_ref_psize / refcnt;
3267                 dds.dds_dsize = zdde->zdde_ref_dsize / refcnt;
3268 
3269                 dds.dds_ref_blocks = zdde->zdde_ref_blocks;
3270                 dds.dds_ref_lsize = zdde->zdde_ref_lsize;
3271                 dds.dds_ref_psize = zdde->zdde_ref_psize;
3272                 dds.dds_ref_dsize = zdde->zdde_ref_dsize;
3273 
3274                 ddt_stat_add(&ddh_total.ddh_stat[highbit64(refcnt) - 1],
3275                     &dds, 0);
3276 
3277                 umem_free(zdde, sizeof (*zdde));
3278         }
3279 
3280         avl_destroy(&t);
3281 
3282         ddt_histogram_stat(&dds_total, &ddh_total);
3283 
3284         (void) printf("Simulated DDT histogram:\n");
3285 
3286         zpool_dump_ddt(&dds_total, &ddh_total);
3287 
3288         dump_dedup_ratio(&dds_total);
3289 }
3290 






















































































































3291 static void
3292 dump_zpool(spa_t *spa)
3293 {
3294         dsl_pool_t *dp = spa_get_dsl(spa);
3295         int rc = 0;
3296 
3297         if (dump_opt['S']) {
3298                 dump_simulated_ddt(spa);
3299                 return;
3300         }
3301 
3302         if (!dump_opt['e'] && dump_opt['C'] > 1) {
3303                 (void) printf("\nCached configuration:\n");
3304                 dump_nvlist(spa->spa_config, 8);
3305         }
3306 
3307         if (dump_opt['C'])
3308                 dump_config(spa);
3309 
3310         if (dump_opt['u'])
3311                 dump_uberblock(&spa->spa_uberblock, "\nUberblock:\n", "\n");
3312 
3313         if (dump_opt['D'])
3314                 dump_all_ddts(spa);
3315 
3316         if (dump_opt['d'] > 2 || dump_opt['m'])
3317                 dump_metaslabs(spa);
3318         if (dump_opt['M'])
3319                 dump_metaslab_groups(spa);
3320 
3321         if (dump_opt['d'] || dump_opt['i']) {
3322                 dump_dir(dp->dp_meta_objset);
3323                 if (dump_opt['d'] >= 3) {

3324                         dump_full_bpobj(&spa->spa_deferred_bpobj,
3325                             "Deferred frees", 0);
3326                         if (spa_version(spa) >= SPA_VERSION_DEADLISTS) {
3327                                 dump_full_bpobj(
3328                                     &spa->spa_dsl_pool->dp_free_bpobj,
3329                                     "Pool snapshot frees", 0);
3330                         }






3331 
3332                         if (spa_feature_is_active(spa,
3333                             SPA_FEATURE_ASYNC_DESTROY)) {
3334                                 dump_bptree(spa->spa_meta_objset,
3335                                     spa->spa_dsl_pool->dp_bptree_obj,
3336                                     "Pool dataset frees");
3337                         }
3338                         dump_dtl(spa->spa_root_vdev, 0);
3339                 }
3340                 (void) dmu_objset_find(spa_name(spa), dump_one_dir,
3341                     NULL, DS_FIND_SNAPSHOTS | DS_FIND_CHILDREN);
3342 
3343                 for (spa_feature_t f = 0; f < SPA_FEATURES; f++) {
3344                         uint64_t refcount;
3345 
3346                         if (!(spa_feature_table[f].fi_flags &
3347                             ZFEATURE_FLAG_PER_DATASET) ||
3348                             !spa_feature_is_enabled(spa, f)) {
3349                                 ASSERT0(dataset_feature_count[f]);
3350                                 continue;
3351                         }
3352                         (void) feature_get_refcount(spa,
3353                             &spa_feature_table[f], &refcount);
3354                         if (dataset_feature_count[f] != refcount) {
3355                                 (void) printf("%s feature refcount mismatch: "
3356                                     "%lld datasets != %lld refcount\n",
3357                                     spa_feature_table[f].fi_uname,
3358                                     (longlong_t)dataset_feature_count[f],
3359                                     (longlong_t)refcount);
3360                                 rc = 2;
3361                         } else {
3362                                 (void) printf("Verified %s feature refcount "
3363                                     "of %llu is correct\n",
3364                                     spa_feature_table[f].fi_uname,
3365                                     (longlong_t)refcount);
3366                         }
3367                 }



3368         }

3369         if (rc == 0 && (dump_opt['b'] || dump_opt['c']))
3370                 rc = dump_block_stats(spa);
3371 
3372         if (rc == 0)
3373                 rc = verify_spacemap_refcounts(spa);
3374 
3375         if (dump_opt['s'])
3376                 show_pool_stats(spa);
3377 
3378         if (dump_opt['h'])
3379                 dump_history(spa);
3380 
3381         if (rc != 0) {
3382                 dump_debug_buffer();
3383                 exit(rc);
3384         }
3385 }
3386 
3387 #define ZDB_FLAG_CHECKSUM       0x0001
3388 #define ZDB_FLAG_DECOMPRESS     0x0002


3655         BP_SET_BYTEORDER(bp, ZFS_HOST_BYTEORDER);
3656 
3657         spa_config_enter(spa, SCL_STATE, FTAG, RW_READER);
3658         zio = zio_root(spa, NULL, NULL, 0);
3659 
3660         if (vd == vd->vdev_top) {
3661                 /*
3662                  * Treat this as a normal block read.
3663                  */
3664                 zio_nowait(zio_read(zio, spa, bp, pabd, psize, NULL, NULL,
3665                     ZIO_PRIORITY_SYNC_READ,
3666                     ZIO_FLAG_CANFAIL | ZIO_FLAG_RAW, NULL));
3667         } else {
3668                 /*
3669                  * Treat this as a vdev child I/O.
3670                  */
3671                 zio_nowait(zio_vdev_child_io(zio, bp, vd, offset, pabd,
3672                     psize, ZIO_TYPE_READ, ZIO_PRIORITY_SYNC_READ,
3673                     ZIO_FLAG_DONT_CACHE | ZIO_FLAG_DONT_QUEUE |
3674                     ZIO_FLAG_DONT_PROPAGATE | ZIO_FLAG_DONT_RETRY |
3675                     ZIO_FLAG_CANFAIL | ZIO_FLAG_RAW, NULL, NULL));

3676         }
3677 
3678         error = zio_wait(zio);
3679         spa_config_exit(spa, SCL_STATE, FTAG);
3680 
3681         if (error) {
3682                 (void) printf("Read of %s failed, error: %d\n", thing, error);
3683                 goto out;
3684         }
3685 
3686         if (flags & ZDB_FLAG_DECOMPRESS) {
3687                 /*
3688                  * We don't know how the data was compressed, so just try
3689                  * every decompress function at every inflated blocksize.
3690                  */
3691                 enum zio_compress c;
3692                 void *pbuf2 = umem_alloc(SPA_MAXBLOCKSIZE, UMEM_NOFAIL);
3693                 void *lbuf2 = umem_alloc(SPA_MAXBLOCKSIZE, UMEM_NOFAIL);
3694 
3695                 abd_copy_to_buf(pbuf2, pabd, psize);


3994         }
3995 
3996         /*
3997          * ZDB does not typically re-read blocks; therefore limit the ARC
3998          * to 256 MB, which can be used entirely for metadata.
3999          */
4000         zfs_arc_max = zfs_arc_meta_limit = 256 * 1024 * 1024;
4001 
4002         /*
4003          * "zdb -c" uses checksum-verifying scrub i/os which are async reads.
4004          * "zdb -b" uses traversal prefetch which uses async reads.
4005          * For good performance, let several of them be active at once.
4006          */
4007         zfs_vdev_async_read_max_active = 10;
4008 
4009         /*
4010          * Disable reference tracking for better performance.
4011          */
4012         reference_tracking_enable = B_FALSE;
4013 






4014         kernel_init(FREAD);
4015         g_zfs = libzfs_init();
4016         ASSERT(g_zfs != NULL);
4017 
4018         if (dump_all)
4019                 verbose = MAX(verbose, 1);
4020 
4021         for (c = 0; c < 256; c++) {
4022                 if (dump_all && strchr("AeEFlLOPRSX", c) == NULL)
4023                         dump_opt[c] = 1;
4024                 if (dump_opt[c])
4025                         dump_opt[c] += verbose;
4026         }
4027 
4028         aok = (dump_opt['A'] == 1) || (dump_opt['A'] > 2);
4029         zfs_recover = (dump_opt['A'] > 1);
4030 
4031         argc -= optind;
4032         argv += optind;
4033