Print this page
NEX-19394 backport 9337 zfs get all is slow due to uncached metadata
Reviewed by: Joyce McIntosh <joyce.mcintosh@nexenta.com>
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Thomas Caputi <tcaputi@datto.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
 Conflicts:
  usr/src/uts/common/fs/zfs/dbuf.c
  usr/src/uts/common/fs/zfs/dmu.c
  usr/src/uts/common/fs/zfs/sys/dmu_objset.h
NEX-15468 panic - Deadlock: cycle in blocking chain with dbuf_destroy calling mutex_vector_enter
Reviewed by: Joyce McIntosh <joyce.mcintosh@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
NEX-16904 Need to port Illumos Bug #9433 to fix ARC hit rate
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
NEX-16146 9188 increase size of dbuf cache to reduce indirect block decompression
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Prashanth Sreenivasa <pks@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Allan Jude <allanjude@freebsd.org>
Reviewed by: Igor Kozhukhov <igor@dilos.org>
Approved by: Garrett D'Amore <garrett@damore.org>
NEX-9752 backport illumos 6950 ARC should cache compressed data
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
6950 ARC should cache compressed data
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Dan Kimmel <dan.kimmel@delphix.com>
Reviewed by: Matt Ahrens <mahrens@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Don Brady <don.brady@intel.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
NEX-5366 Race between unique_insert() and unique_remove() causes ZFS fsid change
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Dan Vatca <dan.vatca@gmail.com>
NEX-5058 WBC: Race between the purging of window and opening new one
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
NEX-2830 ZFS smart compression
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Roman Strashkin <roman.strashkin@nexenta.com>
6267 dn_bonus evicted too early
Reviewed by: Richard Yao <ryao@gentoo.org>
Reviewed by: Xin LI <delphij@freebsd.org>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
6288 dmu_buf_will_dirty could be faster
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Reviewed by: Justin Gibbs <gibbs@scsiguy.com>
Reviewed by: Richard Elling <Richard.Elling@RichardElling.com>
Approved by: Robert Mustacchi <rm@joyent.com>
5987 zfs prefetch code needs work
Reviewed by: Adam Leventhal <ahl@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Paul Dagnelie <pcd@delphix.com>
Approved by: Gordon Ross <gordon.ross@nexenta.com>
6047 SPARC boot should support feature@embedded_data
Reviewed by: Igor Kozhukhov <ikozhukhov@gmail.com>
Approved by: Dan McDonald <danmcd@omniti.com>
5959 clean up per-dataset feature count code
Reviewed by: Toomas Soome <tsoome@me.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Alex Reece <alex@delphix.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
NEX-4582 update wrc test cases for allow to use write back cache per tree of datasets
Reviewed by: Steve Peng <steve.peng@nexenta.com>
Reviewed by: Alex Aizman <alex.aizman@nexenta.com>
5960 zfs recv should prefetch indirect blocks
5925 zfs receive -o origin=
Reviewed by: Prakash Surya <prakash.surya@delphix.com>
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
5911 ZFS "hangs" while deleting file
Reviewed by: Bayard Bell <buffer.g.overflow@gmail.com>
Reviewed by: Alek Pinchuk <alek@nexenta.com>
Reviewed by: Simon Klinkert <simon.klinkert@gmail.com>
Reviewed by: Dan McDonald <danmcd@omniti.com>
Approved by: Richard Lowe <richlowe@richlowe.net>
NEX-1823 Slow performance doing of a large dataset
5911 ZFS "hangs" while deleting file
Reviewed by: Saso Kiselkov <saso.kiselkov@nexenta.com>
Reviewed by: Bayard Bell <bayard.bell@nexenta.com>
NEX-3558 KRRP Integration
NEX-3266 5630 stale bonus buffer in recycled dnode_t leads to data corruption
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george@delphix.com>
Reviewed by: Will Andrews <will@freebsd.org>
Approved by: Robert Mustacchi <rm@joyent.com>
Reviewed by: Dan Fields <dan.fields@nexenta.com>
NEX-3165 segregate ddt in arc
4370 avoid transmitting holes during zfs send
4371 DMU code clean up
Reviewed by: Matthew Ahrens <mahrens@delphix.com>
Reviewed by: George Wilson <george.wilson@delphix.com>
Reviewed by: Christopher Siden <christopher.siden@delphix.com>
Reviewed by: Josef 'Jeff' Sipek <jeffpc@josefsipek.net>
Approved by: Garrett D'Amore <garrett@damore.org>
OS-80 support for vdev and CoS properties for the new I/O scheduler
OS-95 lint warning introduced by OS-61
Moved closed ZFS files to open repo, changed Makefiles accordingly
Removed unneeded weak symbols
Issue #7: add cacheability to the properties
          Contributors: Boris Protopopov
DDT is placed either into special or to L2ARC but not in both
Support for secondarycache=data option
Align mutex tables in arc.c and dbuf.c to 64 bytes (cache line), place each kmutex_t on cache line by itself to avoid false sharing
re #12585 rb4049 ZFS++ work port - refactoring to improve separation of open/closed code, bug fixes, performance improvements - open code
Bug 11205: add missing libzfs_closed_stubs.c to fix opensource-only build.
ZFS plus work: special vdevs, cos, cos/vdev properties

*** 18,28 **** * * CDDL HEADER END */ /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. ! * Copyright 2011 Nexenta Systems, Inc. All rights reserved. * Copyright (c) 2012, 2017 by Delphix. All rights reserved. * Copyright (c) 2013 by Saso Kiselkov. All rights reserved. * Copyright (c) 2013, Joyent, Inc. All rights reserved. * Copyright (c) 2014 Spectra Logic Corporation, All rights reserved. * Copyright (c) 2014 Integros [integros.com] --- 18,28 ---- * * CDDL HEADER END */ /* * Copyright (c) 2005, 2010, Oracle and/or its affiliates. All rights reserved. ! * Copyright 2018 Nexenta Systems, Inc. All rights reserved. * Copyright (c) 2012, 2017 by Delphix. All rights reserved. * Copyright (c) 2013 by Saso Kiselkov. All rights reserved. * Copyright (c) 2013, Joyent, Inc. All rights reserved. * Copyright (c) 2014 Spectra Logic Corporation, All rights reserved. * Copyright (c) 2014 Integros [integros.com]
*** 36,56 **** #include <sys/dmu_objset.h> #include <sys/dsl_dataset.h> #include <sys/dsl_dir.h> #include <sys/dmu_tx.h> #include <sys/spa.h> #include <sys/zio.h> #include <sys/dmu_zfetch.h> #include <sys/sa.h> #include <sys/sa_impl.h> #include <sys/zfeature.h> #include <sys/blkptr.h> #include <sys/range_tree.h> #include <sys/callb.h> #include <sys/abd.h> - #include <sys/vdev.h> - #include <sys/cityhash.h> uint_t zfs_dbuf_evict_key; static boolean_t dbuf_undirty(dmu_buf_impl_t *db, dmu_tx_t *tx); static void dbuf_write(dbuf_dirty_record_t *dr, arc_buf_t *data, dmu_tx_t *tx); --- 36,55 ---- #include <sys/dmu_objset.h> #include <sys/dsl_dataset.h> #include <sys/dsl_dir.h> #include <sys/dmu_tx.h> #include <sys/spa.h> + #include <sys/spa_impl.h> #include <sys/zio.h> #include <sys/dmu_zfetch.h> #include <sys/sa.h> #include <sys/sa_impl.h> #include <sys/zfeature.h> #include <sys/blkptr.h> #include <sys/range_tree.h> #include <sys/callb.h> #include <sys/abd.h> uint_t zfs_dbuf_evict_key; static boolean_t dbuf_undirty(dmu_buf_impl_t *db, dmu_tx_t *tx); static void dbuf_write(dbuf_dirty_record_t *dr, arc_buf_t *data, dmu_tx_t *tx);
*** 72,99 **** static kmutex_t dbuf_evict_lock; static kcondvar_t dbuf_evict_cv; static boolean_t dbuf_evict_thread_exit; /* ! * LRU cache of dbufs. The dbuf cache maintains a list of dbufs that * are not currently held but have been recently released. These dbufs * are not eligible for arc eviction until they are aged out of the cache. - * Dbufs are added to the dbuf cache once the last hold is released. If a - * dbuf is later accessed and still exists in the dbuf cache, then it will - * be removed from the cache and later re-added to the head of the cache. * Dbufs that are aged out of the cache will be immediately destroyed and * become eligible for arc eviction. */ ! static multilist_t *dbuf_cache; ! static refcount_t dbuf_cache_size; ! uint64_t dbuf_cache_max_bytes = 100 * 1024 * 1024; ! /* Cap the size of the dbuf cache to log2 fraction of arc size. */ ! int dbuf_cache_max_shift = 5; /* ! * The dbuf cache uses a three-stage eviction policy: * - A low water marker designates when the dbuf eviction thread * should stop evicting from the dbuf cache. * - When we reach the maximum size (aka mid water mark), we * signal the eviction thread to run. * - The high water mark indicates when the eviction thread --- 71,132 ---- static kmutex_t dbuf_evict_lock; static kcondvar_t dbuf_evict_cv; static boolean_t dbuf_evict_thread_exit; /* ! * There are two dbuf caches; each dbuf can only be in one of them at a time. ! * ! * 1. Cache of metadata dbufs, to help make read-heavy administrative commands ! * from /sbin/zfs run faster. The "metadata cache" specifically stores dbufs ! * that represent the metadata that describes filesystems/snapshots/ ! * bookmarks/properties/etc. We only evict from this cache when we export a ! * pool, to short-circuit as much I/O as possible for all administrative ! * commands that need the metadata. There is no eviction policy for this ! * cache, because we try to only include types in it which would occupy a ! * very small amount of space per object but create a large impact on the ! * performance of these commands. Instead, after it reaches a maximum size ! * (which should only happen on very small memory systems with a very large ! * number of filesystem objects), we stop taking new dbufs into the ! * metadata cache, instead putting them in the normal dbuf cache. ! * ! * 2. LRU cache of dbufs. The "dbuf cache" maintains a list of dbufs that * are not currently held but have been recently released. These dbufs * are not eligible for arc eviction until they are aged out of the cache. * Dbufs that are aged out of the cache will be immediately destroyed and * become eligible for arc eviction. + * + * Dbufs are added to these caches once the last hold is released. If a dbuf is + * later accessed and still exists in the dbuf cache, then it will be removed + * from the cache and later re-added to the head of the cache. + * + * If a given dbuf meets the requirements for the metadata cache, it will go + * there, otherwise it will be considered for the generic LRU dbuf cache. The + * caches and the refcounts tracking their sizes are stored in an array indexed + * by those caches' matching enum values (from dbuf_cached_state_t). */ ! typedef struct dbuf_cache { ! multilist_t *cache; ! refcount_t size; ! } dbuf_cache_t; ! dbuf_cache_t dbuf_caches[DB_CACHE_MAX]; ! /* Size limits for the caches */ ! uint64_t dbuf_cache_max_bytes = 0; ! uint64_t dbuf_metadata_cache_max_bytes = 0; ! /* Set the default sizes of the caches to log2 fraction of arc size */ ! int dbuf_cache_shift = 5; ! int dbuf_metadata_cache_shift = 6; /* ! * For diagnostic purposes, this is incremented whenever we can't add ! * something to the metadata cache because it's full, and instead put ! * the data in the regular dbuf cache. ! */ ! uint64_t dbuf_metadata_cache_overflow; ! ! /* ! * The LRU dbuf cache uses a three-stage eviction policy: * - A low water marker designates when the dbuf eviction thread * should stop evicting from the dbuf cache. * - When we reach the maximum size (aka mid water mark), we * signal the eviction thread to run. * - The high water mark indicates when the eviction thread
*** 162,183 **** } /* * dbuf hash table routines */ static dbuf_hash_table_t dbuf_hash_table; static uint64_t dbuf_hash_count; - /* - * We use Cityhash for this. It's fast, and has good hash properties without - * requiring any large static buffers. - */ static uint64_t dbuf_hash(void *os, uint64_t obj, uint8_t lvl, uint64_t blkid) { ! return (cityhash4((uintptr_t)os, obj, (uint64_t)lvl, blkid)); } #define DBUF_EQUAL(dbuf, os, obj, level, blkid) \ ((dbuf)->db.db_object == (obj) && \ (dbuf)->db_objset == (os) && \ --- 195,226 ---- } /* * dbuf hash table routines */ + #pragma align 64(dbuf_hash_table) static dbuf_hash_table_t dbuf_hash_table; static uint64_t dbuf_hash_count; static uint64_t dbuf_hash(void *os, uint64_t obj, uint8_t lvl, uint64_t blkid) { ! uintptr_t osv = (uintptr_t)os; ! uint64_t crc = -1ULL; ! ! ASSERT(zfs_crc64_table[128] == ZFS_CRC64_POLY); ! crc = (crc >> 8) ^ zfs_crc64_table[(crc ^ (lvl)) & 0xFF]; ! crc = (crc >> 8) ^ zfs_crc64_table[(crc ^ (osv >> 6)) & 0xFF]; ! crc = (crc >> 8) ^ zfs_crc64_table[(crc ^ (obj >> 0)) & 0xFF]; ! crc = (crc >> 8) ^ zfs_crc64_table[(crc ^ (obj >> 8)) & 0xFF]; ! crc = (crc >> 8) ^ zfs_crc64_table[(crc ^ (blkid >> 0)) & 0xFF]; ! crc = (crc >> 8) ^ zfs_crc64_table[(crc ^ (blkid >> 8)) & 0xFF]; ! ! crc ^= (osv>>14) ^ (obj>>16) ^ (blkid>>16); ! ! return (crc); } #define DBUF_EQUAL(dbuf, os, obj, level, blkid) \ ((dbuf)->db.db_object == (obj) && \ (dbuf)->db_objset == (os) && \
*** 391,401 **** --- 434,492 ---- return (is_metadata); } } + boolean_t + dbuf_is_ddt(dmu_buf_impl_t *db) + { + boolean_t is_ddt; + + DB_DNODE_ENTER(db); + is_ddt = (DB_DNODE(db)->dn_type == DMU_OT_DDT_ZAP) || + (DB_DNODE(db)->dn_type == DMU_OT_DDT_STATS); + DB_DNODE_EXIT(db); + + return (is_ddt); + } + /* + * This returns whether this dbuf should be stored in the metadata cache, which + * is based on whether it's from one of the dnode types that store data related + * to traversing dataset hierarchies. + */ + static boolean_t + dbuf_include_in_metadata_cache(dmu_buf_impl_t *db) + { + DB_DNODE_ENTER(db); + dmu_object_type_t type = DB_DNODE(db)->dn_type; + DB_DNODE_EXIT(db); + + /* Check if this dbuf is one of the types we care about */ + if (DMU_OT_IS_METADATA_CACHED(type)) { + /* If we hit this, then we set something up wrong in dmu_ot */ + ASSERT(DMU_OT_IS_METADATA(type)); + + /* + * Sanity check for small-memory systems: don't allocate too + * much memory for this purpose. + */ + if (refcount_count(&dbuf_caches[DB_DBUF_METADATA_CACHE].size) > + dbuf_metadata_cache_max_bytes) { + dbuf_metadata_cache_overflow++; + DTRACE_PROBE1(dbuf__metadata__cache__overflow, + dmu_buf_impl_t *, db); + return (B_FALSE); + } + + return (B_TRUE); + } + + return (B_FALSE); + } + + /* * This function *must* return indices evenly distributed between all * sublists of the multilist. This is needed due to how the dbuf eviction * code is laid out; dbuf_evict_thread() assumes dbufs are evenly * distributed between all sublists and uses this assumption when * deciding which sublist to evict from and how much to evict from it.
*** 426,457 **** dbuf_cache_above_hiwater(void) { uint64_t dbuf_cache_hiwater_bytes = (dbuf_cache_max_bytes * dbuf_cache_hiwater_pct) / 100; ! return (refcount_count(&dbuf_cache_size) > dbuf_cache_max_bytes + dbuf_cache_hiwater_bytes); } static inline boolean_t dbuf_cache_above_lowater(void) { uint64_t dbuf_cache_lowater_bytes = (dbuf_cache_max_bytes * dbuf_cache_lowater_pct) / 100; ! return (refcount_count(&dbuf_cache_size) > dbuf_cache_max_bytes - dbuf_cache_lowater_bytes); } /* * Evict the oldest eligible dbuf from the dbuf cache. */ static void dbuf_evict_one(void) { ! int idx = multilist_get_random_index(dbuf_cache); ! multilist_sublist_t *mls = multilist_sublist_lock(dbuf_cache, idx); ASSERT(!MUTEX_HELD(&dbuf_evict_lock)); /* * Set the thread's tsd to indicate that it's processing evictions. --- 517,549 ---- dbuf_cache_above_hiwater(void) { uint64_t dbuf_cache_hiwater_bytes = (dbuf_cache_max_bytes * dbuf_cache_hiwater_pct) / 100; ! return (refcount_count(&dbuf_caches[DB_DBUF_CACHE].size) > dbuf_cache_max_bytes + dbuf_cache_hiwater_bytes); } static inline boolean_t dbuf_cache_above_lowater(void) { uint64_t dbuf_cache_lowater_bytes = (dbuf_cache_max_bytes * dbuf_cache_lowater_pct) / 100; ! return (refcount_count(&dbuf_caches[DB_DBUF_CACHE].size) > dbuf_cache_max_bytes - dbuf_cache_lowater_bytes); } /* * Evict the oldest eligible dbuf from the dbuf cache. */ static void dbuf_evict_one(void) { ! int idx = multilist_get_random_index(dbuf_caches[DB_DBUF_CACHE].cache); ! multilist_sublist_t *mls = multilist_sublist_lock( ! dbuf_caches[DB_DBUF_CACHE].cache, idx); ASSERT(!MUTEX_HELD(&dbuf_evict_lock)); /* * Set the thread's tsd to indicate that it's processing evictions.
*** 470,481 **** multilist_sublist_t *, mls); if (db != NULL) { multilist_sublist_remove(mls, db); multilist_sublist_unlock(mls); ! (void) refcount_remove_many(&dbuf_cache_size, db->db.db_size, db); dbuf_destroy(db); } else { multilist_sublist_unlock(mls); } (void) tsd_set(zfs_dbuf_evict_key, NULL); --- 562,575 ---- multilist_sublist_t *, mls); if (db != NULL) { multilist_sublist_remove(mls, db); multilist_sublist_unlock(mls); ! (void) refcount_remove_many(&dbuf_caches[DB_DBUF_CACHE].size, db->db.db_size, db); + ASSERT3U(db->db_caching_status, ==, DB_DBUF_CACHE); + db->db_caching_status = DB_NO_CACHE; dbuf_destroy(db); } else { multilist_sublist_unlock(mls); } (void) tsd_set(zfs_dbuf_evict_key, NULL);
*** 524,535 **** thread_exit(); } /* * Wake up the dbuf eviction thread if the dbuf cache is at its max size. ! * If the dbuf cache is at its high water mark, then evict a dbuf from the ! * dbuf cache using the callers context. */ static void dbuf_evict_notify(void) { --- 618,645 ---- thread_exit(); } /* * Wake up the dbuf eviction thread if the dbuf cache is at its max size. ! * ! * Direct eviction (dbuf_evict_one()) is not called here, because ! * the function doesn't care about the selected dbuf, so the following ! * case is possible which will cause a deadlock-panic: ! * ! * Thread A is evicting dbufs that are related to dnodeA ! * dnode_evict_dbufs(dnoneA) enters dn_dbufs_mtx and after that walks ! * its own AVL of dbufs and calls dbuf_destroy(): ! * dbuf_destroy() ->...-> dbuf_evict_notify() -> dbuf_evict_one() -> ! * -> select a dbuf from cache -> dbuf_destroy() -> ! * -> mutex_enter(dn_dbufs_mtx of dnoneB) ! * ! * Thread B is evicting dbufs that are related to dnodeB ! * dnode_evict_dbufs(dnoneB) enters dn_dbufs_mtx and after that walks ! * its own AVL of dbufs and calls dbuf_destroy(): ! * dbuf_destroy() ->...-> dbuf_evict_notify() -> dbuf_evict_one() -> ! * -> select a dbuf from cache -> dbuf_destroy() -> ! * -> mutex_enter(dn_dbufs_mtx of dnoneA) */ static void dbuf_evict_notify(void) {
*** 558,568 **** /* * We check if we should evict without holding the dbuf_evict_lock, * because it's OK to occasionally make the wrong decision here, * and grabbing the lock results in massive lock contention. */ ! if (refcount_count(&dbuf_cache_size) > dbuf_cache_max_bytes) { if (dbuf_cache_above_hiwater()) dbuf_evict_one(); cv_signal(&dbuf_evict_cv); } } --- 668,679 ---- /* * We check if we should evict without holding the dbuf_evict_lock, * because it's OK to occasionally make the wrong decision here, * and grabbing the lock results in massive lock contention. */ ! if (refcount_count(&dbuf_caches[DB_DBUF_CACHE].size) > ! dbuf_cache_max_bytes) { if (dbuf_cache_above_hiwater()) dbuf_evict_one(); cv_signal(&dbuf_evict_cv); } }
*** 595,623 **** dbuf_kmem_cache = kmem_cache_create("dmu_buf_impl_t", sizeof (dmu_buf_impl_t), 0, dbuf_cons, dbuf_dest, NULL, NULL, NULL, 0); for (i = 0; i < DBUF_MUTEXES; i++) ! mutex_init(&h->hash_mutexes[i], NULL, MUTEX_DEFAULT, NULL); /* ! * Setup the parameters for the dbuf cache. We cap the size of the ! * dbuf cache to 1/32nd (default) of the size of the ARC. */ ! dbuf_cache_max_bytes = MIN(dbuf_cache_max_bytes, ! arc_max_bytes() >> dbuf_cache_max_shift); /* * All entries are queued via taskq_dispatch_ent(), so min/maxalloc * configuration is not required. */ dbu_evict_taskq = taskq_create("dbu_evict", 1, minclsyspri, 0, 0, 0); ! dbuf_cache = multilist_create(sizeof (dmu_buf_impl_t), offsetof(dmu_buf_impl_t, db_cache_link), dbuf_cache_multilist_index_func); ! refcount_create(&dbuf_cache_size); tsd_create(&zfs_dbuf_evict_key, NULL); dbuf_evict_thread_exit = B_FALSE; mutex_init(&dbuf_evict_lock, NULL, MUTEX_DEFAULT, NULL); cv_init(&dbuf_evict_cv, NULL, CV_DEFAULT, NULL); --- 706,761 ---- dbuf_kmem_cache = kmem_cache_create("dmu_buf_impl_t", sizeof (dmu_buf_impl_t), 0, dbuf_cons, dbuf_dest, NULL, NULL, NULL, 0); for (i = 0; i < DBUF_MUTEXES; i++) ! mutex_init(DBUF_HASH_MUTEX(h, i), NULL, MUTEX_DEFAULT, NULL); + /* ! * Setup the parameters for the dbuf caches. We set the sizes of the ! * dbuf cache and the metadata cache to 1/32nd and 1/16th (default) ! * of the size of the ARC, respectively. */ ! if (dbuf_cache_max_bytes == 0 || ! dbuf_cache_max_bytes >= arc_max_bytes()) { ! dbuf_cache_max_bytes = arc_max_bytes() >> dbuf_cache_shift; ! } ! if (dbuf_metadata_cache_max_bytes == 0 || ! dbuf_metadata_cache_max_bytes >= arc_max_bytes()) { ! dbuf_metadata_cache_max_bytes = ! arc_max_bytes() >> dbuf_metadata_cache_shift; ! } /* + * The combined size of both caches should be less + * the size of ARC, otherwise need to set them to + * the default values. + * + * divide by 2 is a simple overflow protection + */ + if (((dbuf_cache_max_bytes / 2) + + (dbuf_metadata_cache_max_bytes / 2)) >= (arc_max_bytes() / 2)) { + dbuf_cache_max_bytes = arc_max_bytes() >> dbuf_cache_shift; + dbuf_metadata_cache_max_bytes = + arc_max_bytes() >> dbuf_metadata_cache_shift; + } + + + /* * All entries are queued via taskq_dispatch_ent(), so min/maxalloc * configuration is not required. */ dbu_evict_taskq = taskq_create("dbu_evict", 1, minclsyspri, 0, 0, 0); ! for (dbuf_cached_state_t dcs = 0; dcs < DB_CACHE_MAX; dcs++) { ! dbuf_caches[dcs].cache = ! multilist_create(sizeof (dmu_buf_impl_t), offsetof(dmu_buf_impl_t, db_cache_link), dbuf_cache_multilist_index_func); ! refcount_create(&dbuf_caches[dcs].size); ! } tsd_create(&zfs_dbuf_evict_key, NULL); dbuf_evict_thread_exit = B_FALSE; mutex_init(&dbuf_evict_lock, NULL, MUTEX_DEFAULT, NULL); cv_init(&dbuf_evict_cv, NULL, CV_DEFAULT, NULL);
*** 630,640 **** { dbuf_hash_table_t *h = &dbuf_hash_table; int i; for (i = 0; i < DBUF_MUTEXES; i++) ! mutex_destroy(&h->hash_mutexes[i]); kmem_free(h->hash_table, (h->hash_table_mask + 1) * sizeof (void *)); kmem_cache_destroy(dbuf_kmem_cache); taskq_destroy(dbu_evict_taskq); mutex_enter(&dbuf_evict_lock); --- 768,778 ---- { dbuf_hash_table_t *h = &dbuf_hash_table; int i; for (i = 0; i < DBUF_MUTEXES; i++) ! mutex_destroy(DBUF_HASH_MUTEX(h, i)); kmem_free(h->hash_table, (h->hash_table_mask + 1) * sizeof (void *)); kmem_cache_destroy(dbuf_kmem_cache); taskq_destroy(dbu_evict_taskq); mutex_enter(&dbuf_evict_lock);
*** 647,658 **** tsd_destroy(&zfs_dbuf_evict_key); mutex_destroy(&dbuf_evict_lock); cv_destroy(&dbuf_evict_cv); ! refcount_destroy(&dbuf_cache_size); ! multilist_destroy(dbuf_cache); } /* * Other stuff. */ --- 785,798 ---- tsd_destroy(&zfs_dbuf_evict_key); mutex_destroy(&dbuf_evict_lock); cv_destroy(&dbuf_evict_cv); ! for (dbuf_cached_state_t dcs = 0; dcs < DB_CACHE_MAX; dcs++) { ! refcount_destroy(&dbuf_caches[dcs].size); ! multilist_destroy(dbuf_caches[dcs].cache); ! } } /* * Other stuff. */
*** 1412,1422 **** /* * We already have a dirty record for this TXG, and we are being * dirtied again. */ static void ! dbuf_redirty(dbuf_dirty_record_t *dr) { dmu_buf_impl_t *db = dr->dr_dbuf; ASSERT(MUTEX_HELD(&db->db_mtx)); --- 1552,1562 ---- /* * We already have a dirty record for this TXG, and we are being * dirtied again. */ static void ! dbuf_redirty(dbuf_dirty_record_t *dr, boolean_t usesc) { dmu_buf_impl_t *db = dr->dr_dbuf; ASSERT(MUTEX_HELD(&db->db_mtx));
*** 1431,1444 **** /* Already released on initial dirty, so just thaw. */ ASSERT(arc_released(db->db_buf)); arc_buf_thaw(db->db_buf); } } } dbuf_dirty_record_t * ! dbuf_dirty(dmu_buf_impl_t *db, dmu_tx_t *tx) { dnode_t *dn; objset_t *os; dbuf_dirty_record_t **drp, *dr; int drop_struct_lock = FALSE; --- 1571,1589 ---- /* Already released on initial dirty, so just thaw. */ ASSERT(arc_released(db->db_buf)); arc_buf_thaw(db->db_buf); } } + /* + * Special class usage of dirty dbuf could be changed, + * update the dirty entry. + */ + dr->dr_usesc = usesc; } dbuf_dirty_record_t * ! dbuf_dirty_sc(dmu_buf_impl_t *db, dmu_tx_t *tx, boolean_t usesc) { dnode_t *dn; objset_t *os; dbuf_dirty_record_t **drp, *dr; int drop_struct_lock = FALSE;
*** 1521,1531 **** while ((dr = *drp) != NULL && dr->dr_txg > tx->tx_txg) drp = &dr->dr_next; if (dr && dr->dr_txg == tx->tx_txg) { DB_DNODE_EXIT(db); ! dbuf_redirty(dr); mutex_exit(&db->db_mtx); return (dr); } /* --- 1666,1676 ---- while ((dr = *drp) != NULL && dr->dr_txg > tx->tx_txg) drp = &dr->dr_next; if (dr && dr->dr_txg == tx->tx_txg) { DB_DNODE_EXIT(db); ! dbuf_redirty(dr, usesc); mutex_exit(&db->db_mtx); return (dr); } /*
*** 1601,1610 **** --- 1746,1756 ---- if (db->db_blkid != DMU_BONUS_BLKID && os->os_dsl_dataset != NULL) dr->dr_accounted = db->db.db_size; dr->dr_dbuf = db; dr->dr_txg = tx->tx_txg; dr->dr_next = *drp; + dr->dr_usesc = usesc; *drp = dr; /* * We could have been freed_in_flight between the dbuf_noread * and dbuf_dirty. We win, as though the dbuf_noread() had
*** 1634,1644 **** db->db_blkid == DMU_SPILL_BLKID) { mutex_enter(&dn->dn_mtx); ASSERT(!list_link_active(&dr->dr_dirty_node)); list_insert_tail(&dn->dn_dirty_records[txgoff], dr); mutex_exit(&dn->dn_mtx); ! dnode_setdirty(dn, tx); DB_DNODE_EXIT(db); return (dr); } /* --- 1780,1790 ---- db->db_blkid == DMU_SPILL_BLKID) { mutex_enter(&dn->dn_mtx); ASSERT(!list_link_active(&dr->dr_dirty_node)); list_insert_tail(&dn->dn_dirty_records[txgoff], dr); mutex_exit(&dn->dn_mtx); ! dnode_setdirty_sc(dn, tx, usesc); DB_DNODE_EXIT(db); return (dr); } /*
*** 1669,1679 **** * syncing context won't have to wait for the i/o. */ ddt_prefetch(os->os_spa, db->db_blkptr); if (db->db_level == 0) { ! dnode_new_blkid(dn, db->db_blkid, tx, drop_struct_lock); ASSERT(dn->dn_maxblkid >= db->db_blkid); } if (db->db_level+1 < dn->dn_nlevels) { dmu_buf_impl_t *parent = db->db_parent; --- 1815,1825 ---- * syncing context won't have to wait for the i/o. */ ddt_prefetch(os->os_spa, db->db_blkptr); if (db->db_level == 0) { ! dnode_new_blkid(dn, db->db_blkid, tx, usesc, drop_struct_lock); ASSERT(dn->dn_maxblkid >= db->db_blkid); } if (db->db_level+1 < dn->dn_nlevels) { dmu_buf_impl_t *parent = db->db_parent;
*** 1689,1699 **** parent_held = TRUE; } if (drop_struct_lock) rw_exit(&dn->dn_struct_rwlock); ASSERT3U(db->db_level+1, ==, parent->db_level); ! di = dbuf_dirty(parent, tx); if (parent_held) dbuf_rele(parent, FTAG); mutex_enter(&db->db_mtx); /* --- 1835,1845 ---- parent_held = TRUE; } if (drop_struct_lock) rw_exit(&dn->dn_struct_rwlock); ASSERT3U(db->db_level+1, ==, parent->db_level); ! di = dbuf_dirty_sc(parent, tx, usesc); if (parent_held) dbuf_rele(parent, FTAG); mutex_enter(&db->db_mtx); /*
*** 1707,1716 **** --- 1853,1868 ---- ASSERT(!list_link_active(&dr->dr_dirty_node)); list_insert_tail(&di->dt.di.dr_children, dr); mutex_exit(&di->dt.di.dr_mtx); dr->dr_parent = di; } + + /* + * Special class usage of dirty dbuf could be changed, + * update the dirty entry. + */ + dr->dr_usesc = usesc; mutex_exit(&db->db_mtx); } else { ASSERT(db->db_level+1 == dn->dn_nlevels); ASSERT(db->db_blkid < dn->dn_nblkptr); ASSERT(db->db_parent == NULL || db->db_parent == dn->dn_dbuf);
*** 1720,1734 **** mutex_exit(&dn->dn_mtx); if (drop_struct_lock) rw_exit(&dn->dn_struct_rwlock); } ! dnode_setdirty(dn, tx); DB_DNODE_EXIT(db); return (dr); } /* * Undirty a buffer in the transaction group referenced by the given * transaction. Return whether this evicted the dbuf. */ static boolean_t --- 1872,1897 ---- mutex_exit(&dn->dn_mtx); if (drop_struct_lock) rw_exit(&dn->dn_struct_rwlock); } ! dnode_setdirty_sc(dn, tx, usesc); DB_DNODE_EXIT(db); return (dr); } + dbuf_dirty_record_t * + dbuf_dirty(dmu_buf_impl_t *db, dmu_tx_t *tx) + { + spa_t *spa; + + ASSERT(db->db_objset != NULL); + spa = db->db_objset->os_spa; + + return (dbuf_dirty_sc(db, tx, spa->spa_usesc)); + } + /* * Undirty a buffer in the transaction group referenced by the given * transaction. Return whether this evicted the dbuf. */ static boolean_t
*** 1820,1829 **** --- 1983,2000 ---- void dmu_buf_will_dirty(dmu_buf_t *db_fake, dmu_tx_t *tx) { dmu_buf_impl_t *db = (dmu_buf_impl_t *)db_fake; + spa_t *spa = db->db_objset->os_spa; + dmu_buf_will_dirty_sc(db_fake, tx, spa->spa_usesc); + } + + void + dmu_buf_will_dirty_sc(dmu_buf_t *db_fake, dmu_tx_t *tx, boolean_t usesc) + { + dmu_buf_impl_t *db = (dmu_buf_impl_t *)db_fake; int rf = DB_RF_MUST_SUCCEED | DB_RF_NOPREFETCH; ASSERT(tx->tx_txg != 0); ASSERT(!refcount_is_zero(&db->db_holds));
*** 1842,1852 **** * because there are some calls to dbuf_dirty() that don't * go through dmu_buf_will_dirty(). */ if (dr->dr_txg == tx->tx_txg && db->db_state == DB_CACHED) { /* This dbuf is already dirty and cached. */ ! dbuf_redirty(dr); mutex_exit(&db->db_mtx); return; } } mutex_exit(&db->db_mtx); --- 2013,2023 ---- * because there are some calls to dbuf_dirty() that don't * go through dmu_buf_will_dirty(). */ if (dr->dr_txg == tx->tx_txg && db->db_state == DB_CACHED) { /* This dbuf is already dirty and cached. */ ! dbuf_redirty(dr, usesc); mutex_exit(&db->db_mtx); return; } } mutex_exit(&db->db_mtx);
*** 1854,1866 **** DB_DNODE_ENTER(db); if (RW_WRITE_HELD(&DB_DNODE(db)->dn_struct_rwlock)) rf |= DB_RF_HAVESTRUCT; DB_DNODE_EXIT(db); (void) dbuf_read(db, NULL, rf); ! (void) dbuf_dirty(db, tx); } void dmu_buf_will_not_fill(dmu_buf_t *db_fake, dmu_tx_t *tx) { dmu_buf_impl_t *db = (dmu_buf_impl_t *)db_fake; --- 2025,2038 ---- DB_DNODE_ENTER(db); if (RW_WRITE_HELD(&DB_DNODE(db)->dn_struct_rwlock)) rf |= DB_RF_HAVESTRUCT; DB_DNODE_EXIT(db); (void) dbuf_read(db, NULL, rf); ! (void) dbuf_dirty_sc(db, tx, usesc); } + void dmu_buf_will_not_fill(dmu_buf_t *db_fake, dmu_tx_t *tx) { dmu_buf_impl_t *db = (dmu_buf_impl_t *)db_fake;
*** 2031,2043 **** } dbuf_clear_data(db); if (multilist_link_active(&db->db_cache_link)) { ! multilist_remove(dbuf_cache, db); ! (void) refcount_remove_many(&dbuf_cache_size, db->db.db_size, db); } ASSERT(db->db_state == DB_UNCACHED || db->db_state == DB_NOFILL); ASSERT(db->db_data_pending == NULL); --- 2203,2221 ---- } dbuf_clear_data(db); if (multilist_link_active(&db->db_cache_link)) { ! ASSERT(db->db_caching_status == DB_DBUF_CACHE || ! db->db_caching_status == DB_DBUF_METADATA_CACHE); ! ! multilist_remove(dbuf_caches[db->db_caching_status].cache, db); ! (void) refcount_remove_many( ! &dbuf_caches[db->db_caching_status].size, db->db.db_size, db); + + db->db_caching_status = DB_NO_CACHE; } ASSERT(db->db_state == DB_UNCACHED || db->db_state == DB_NOFILL); ASSERT(db->db_data_pending == NULL);
*** 2087,2096 **** --- 2265,2275 ---- ASSERT(db->db_buf == NULL); ASSERT(db->db.db_data == NULL); ASSERT(db->db_hash_next == NULL); ASSERT(db->db_blkptr == NULL); ASSERT(db->db_data_pending == NULL); + ASSERT3U(db->db_caching_status, ==, DB_NO_CACHE); ASSERT(!multilist_link_active(&db->db_cache_link)); kmem_cache_free(dbuf_kmem_cache, db); arc_space_return(sizeof (dmu_buf_impl_t), ARC_SPACE_OTHER);
*** 2225,2234 **** --- 2404,2414 ---- db->db.db_size = DN_MAX_BONUSLEN - (dn->dn_nblkptr-1) * sizeof (blkptr_t); ASSERT3U(db->db.db_size, >=, dn->dn_bonuslen); db->db.db_offset = DMU_BONUS_BLKID; db->db_state = DB_UNCACHED; + db->db_caching_status = DB_NO_CACHE; /* the bonus dbuf is not placed in the hash table */ arc_space_consume(sizeof (dmu_buf_impl_t), ARC_SPACE_OTHER); return (db); } else if (blkid == DMU_SPILL_BLKID) { db->db.db_size = (blkptr != NULL) ?
*** 2257,2266 **** --- 2437,2447 ---- return (odb); } avl_add(&dn->dn_dbufs, db); db->db_state = DB_UNCACHED; + db->db_caching_status = DB_NO_CACHE; mutex_exit(&dn->dn_dbufs_mtx); arc_space_consume(sizeof (dmu_buf_impl_t), ARC_SPACE_OTHER); if (parent && parent != dn->dn_dbuf) dbuf_add_ref(parent, db);
*** 2563,2574 **** if (fail_uncached && db->db_state != DB_CACHED) { mutex_exit(&db->db_mtx); return (SET_ERROR(ENOENT)); } ! if (db->db_buf != NULL) ASSERT3P(db->db.db_data, ==, db->db_buf->b_data); ASSERT(db->db_buf == NULL || arc_referenced(db->db_buf)); /* * If this buffer is currently syncing out, and we are are --- 2744,2757 ---- if (fail_uncached && db->db_state != DB_CACHED) { mutex_exit(&db->db_mtx); return (SET_ERROR(ENOENT)); } ! if (db->db_buf != NULL) { ! arc_buf_access(db->db_buf); ASSERT3P(db->db.db_data, ==, db->db_buf->b_data); + } ASSERT(db->db_buf == NULL || arc_referenced(db->db_buf)); /* * If this buffer is currently syncing out, and we are are
*** 2591,2603 **** } } if (multilist_link_active(&db->db_cache_link)) { ASSERT(refcount_is_zero(&db->db_holds)); ! multilist_remove(dbuf_cache, db); ! (void) refcount_remove_many(&dbuf_cache_size, db->db.db_size, db); } (void) refcount_add(&db->db_holds, tag); DBUF_VERIFY(db); mutex_exit(&db->db_mtx); --- 2774,2792 ---- } } if (multilist_link_active(&db->db_cache_link)) { ASSERT(refcount_is_zero(&db->db_holds)); ! ASSERT(db->db_caching_status == DB_DBUF_CACHE || ! db->db_caching_status == DB_DBUF_METADATA_CACHE); ! ! multilist_remove(dbuf_caches[db->db_caching_status].cache, db); ! (void) refcount_remove_many( ! &dbuf_caches[db->db_caching_status].size, db->db.db_size, db); + + db->db_caching_status = DB_NO_CACHE; } (void) refcount_add(&db->db_holds, tag); DBUF_VERIFY(db); mutex_exit(&db->db_mtx);
*** 2810,2826 **** if (!DBUF_IS_CACHEABLE(db) || db->db_pending_evict) { dbuf_destroy(db); } else if (!multilist_link_active(&db->db_cache_link)) { ! multilist_insert(dbuf_cache, db); ! (void) refcount_add_many(&dbuf_cache_size, db->db.db_size, db); mutex_exit(&db->db_mtx); dbuf_evict_notify(); } if (do_arc_evict) arc_freed(spa, &bp); } } else { --- 2999,3025 ---- if (!DBUF_IS_CACHEABLE(db) || db->db_pending_evict) { dbuf_destroy(db); } else if (!multilist_link_active(&db->db_cache_link)) { ! ASSERT3U(db->db_caching_status, ==, ! DB_NO_CACHE); ! ! dbuf_cached_state_t dcs = ! dbuf_include_in_metadata_cache(db) ? ! DB_DBUF_METADATA_CACHE : DB_DBUF_CACHE; ! db->db_caching_status = dcs; ! ! multilist_insert(dbuf_caches[dcs].cache, db); ! (void) refcount_add_many(&dbuf_caches[dcs].size, db->db.db_size, db); mutex_exit(&db->db_mtx); + if (db->db_caching_status == DB_DBUF_CACHE) { dbuf_evict_notify(); } + } if (do_arc_evict) arc_freed(spa, &bp); } } else {
*** 2998,3008 **** /* Provide the pending dirty record to child dbufs */ db->db_data_pending = dr; mutex_exit(&db->db_mtx); - dbuf_write(dr, db->db_buf, tx); zio = dr->dr_zio; mutex_enter(&dr->dt.di.dr_mtx); dbuf_sync_list(&dr->dt.di.dr_children, db->db_level - 1, tx); --- 3197,3206 ----
*** 3470,3614 **** if (zio->io_abd != NULL) abd_put(zio->io_abd); } - typedef struct dbuf_remap_impl_callback_arg { - objset_t *drica_os; - uint64_t drica_blk_birth; - dmu_tx_t *drica_tx; - } dbuf_remap_impl_callback_arg_t; - - static void - dbuf_remap_impl_callback(uint64_t vdev, uint64_t offset, uint64_t size, - void *arg) - { - dbuf_remap_impl_callback_arg_t *drica = arg; - objset_t *os = drica->drica_os; - spa_t *spa = dmu_objset_spa(os); - dmu_tx_t *tx = drica->drica_tx; - - ASSERT(dsl_pool_sync_context(spa_get_dsl(spa))); - - if (os == spa_meta_objset(spa)) { - spa_vdev_indirect_mark_obsolete(spa, vdev, offset, size, tx); - } else { - dsl_dataset_block_remapped(dmu_objset_ds(os), vdev, offset, - size, drica->drica_blk_birth, tx); - } - } - - static void - dbuf_remap_impl(dnode_t *dn, blkptr_t *bp, dmu_tx_t *tx) - { - blkptr_t bp_copy = *bp; - spa_t *spa = dmu_objset_spa(dn->dn_objset); - dbuf_remap_impl_callback_arg_t drica; - - ASSERT(dsl_pool_sync_context(spa_get_dsl(spa))); - - drica.drica_os = dn->dn_objset; - drica.drica_blk_birth = bp->blk_birth; - drica.drica_tx = tx; - if (spa_remap_blkptr(spa, &bp_copy, dbuf_remap_impl_callback, - &drica)) { - /* - * The struct_rwlock prevents dbuf_read_impl() from - * dereferencing the BP while we are changing it. To - * avoid lock contention, only grab it when we are actually - * changing the BP. - */ - rw_enter(&dn->dn_struct_rwlock, RW_WRITER); - *bp = bp_copy; - rw_exit(&dn->dn_struct_rwlock); - } - } - - /* - * Returns true if a dbuf_remap would modify the dbuf. We do this by attempting - * to remap a copy of every bp in the dbuf. - */ - boolean_t - dbuf_can_remap(const dmu_buf_impl_t *db) - { - spa_t *spa = dmu_objset_spa(db->db_objset); - blkptr_t *bp = db->db.db_data; - boolean_t ret = B_FALSE; - - ASSERT3U(db->db_level, >, 0); - ASSERT3S(db->db_state, ==, DB_CACHED); - - ASSERT(spa_feature_is_active(spa, SPA_FEATURE_DEVICE_REMOVAL)); - - spa_config_enter(spa, SCL_VDEV, FTAG, RW_READER); - for (int i = 0; i < db->db.db_size >> SPA_BLKPTRSHIFT; i++) { - blkptr_t bp_copy = bp[i]; - if (spa_remap_blkptr(spa, &bp_copy, NULL, NULL)) { - ret = B_TRUE; - break; - } - } - spa_config_exit(spa, SCL_VDEV, FTAG); - - return (ret); - } - - boolean_t - dnode_needs_remap(const dnode_t *dn) - { - spa_t *spa = dmu_objset_spa(dn->dn_objset); - boolean_t ret = B_FALSE; - - if (dn->dn_phys->dn_nlevels == 0) { - return (B_FALSE); - } - - ASSERT(spa_feature_is_active(spa, SPA_FEATURE_DEVICE_REMOVAL)); - - spa_config_enter(spa, SCL_VDEV, FTAG, RW_READER); - for (int j = 0; j < dn->dn_phys->dn_nblkptr; j++) { - blkptr_t bp_copy = dn->dn_phys->dn_blkptr[j]; - if (spa_remap_blkptr(spa, &bp_copy, NULL, NULL)) { - ret = B_TRUE; - break; - } - } - spa_config_exit(spa, SCL_VDEV, FTAG); - - return (ret); - } - - /* - * Remap any existing BP's to concrete vdevs, if possible. - */ - static void - dbuf_remap(dnode_t *dn, dmu_buf_impl_t *db, dmu_tx_t *tx) - { - spa_t *spa = dmu_objset_spa(db->db_objset); - ASSERT(dsl_pool_sync_context(spa_get_dsl(spa))); - - if (!spa_feature_is_active(spa, SPA_FEATURE_DEVICE_REMOVAL)) - return; - - if (db->db_level > 0) { - blkptr_t *bp = db->db.db_data; - for (int i = 0; i < db->db.db_size >> SPA_BLKPTRSHIFT; i++) { - dbuf_remap_impl(dn, &bp[i], tx); - } - } else if (db->db.db_object == DMU_META_DNODE_OBJECT) { - dnode_phys_t *dnp = db->db.db_data; - ASSERT3U(db->db_dnode_handle->dnh_dnode->dn_type, ==, - DMU_OT_DNODE); - for (int i = 0; i < db->db.db_size >> DNODE_SHIFT; i++) { - for (int j = 0; j < dnp[i].dn_nblkptr; j++) { - dbuf_remap_impl(dn, &dnp[i].dn_blkptr[j], tx); - } - } - } - } - - /* Issue I/O to commit a dirty buffer to disk. */ static void dbuf_write(dbuf_dirty_record_t *dr, arc_buf_t *data, dmu_tx_t *tx) { dmu_buf_impl_t *db = dr->dr_dbuf; --- 3668,3677 ----
*** 3618,3634 **** --- 3681,3700 ---- uint64_t txg = tx->tx_txg; zbookmark_phys_t zb; zio_prop_t zp; zio_t *zio; int wp_flag = 0; + zio_smartcomp_info_t sc; ASSERT(dmu_tx_is_syncing(tx)); DB_DNODE_ENTER(db); dn = DB_DNODE(db); os = dn->dn_objset; + dnode_setup_zio_smartcomp(db, &sc); + if (db->db_state != DB_NOFILL) { if (db->db_level > 0 || dn->dn_type == DMU_OT_DNODE) { /* * Private object buffers are released here rather * than in dbuf_dirty() since they are only modified
*** 3638,3648 **** if (BP_IS_HOLE(db->db_blkptr)) { arc_buf_thaw(data); } else { dbuf_release_bp(db); } - dbuf_remap(dn, db, tx); } } if (parent != dn->dn_dbuf) { /* Our parent is an indirect block. */ --- 3704,3713 ----
*** 3676,3685 **** --- 3741,3751 ---- db->db.db_object, db->db_level, db->db_blkid); if (db->db_blkid == DMU_SPILL_BLKID) wp_flag = WP_SPILL; wp_flag |= (db->db_state == DB_NOFILL) ? WP_NOFILL : 0; + WP_SET_SPECIALCLASS(wp_flag, dr->dr_usesc); dmu_write_policy(os, dn, db->db_level, wp_flag, &zp); DB_DNODE_EXIT(db); /*
*** 3701,3711 **** dr->dr_zio = zio_write(zio, os->os_spa, txg, &dr->dr_bp_copy, contents, db->db.db_size, db->db.db_size, &zp, dbuf_write_override_ready, NULL, NULL, dbuf_write_override_done, ! dr, ZIO_PRIORITY_ASYNC_WRITE, ZIO_FLAG_MUSTSUCCEED, &zb); mutex_enter(&db->db_mtx); dr->dt.dl.dr_override_state = DR_NOT_OVERRIDDEN; zio_write_override(dr->dr_zio, &dr->dt.dl.dr_overridden_by, dr->dt.dl.dr_copies, dr->dt.dl.dr_nopwrite); mutex_exit(&db->db_mtx); --- 3767,3778 ---- dr->dr_zio = zio_write(zio, os->os_spa, txg, &dr->dr_bp_copy, contents, db->db.db_size, db->db.db_size, &zp, dbuf_write_override_ready, NULL, NULL, dbuf_write_override_done, ! dr, ZIO_PRIORITY_ASYNC_WRITE, ZIO_FLAG_MUSTSUCCEED, &zb, ! &sc); mutex_enter(&db->db_mtx); dr->dt.dl.dr_override_state = DR_NOT_OVERRIDDEN; zio_write_override(dr->dr_zio, &dr->dt.dl.dr_overridden_by, dr->dt.dl.dr_copies, dr->dt.dl.dr_nopwrite); mutex_exit(&db->db_mtx);
*** 3715,3725 **** dr->dr_zio = zio_write(zio, os->os_spa, txg, &dr->dr_bp_copy, NULL, db->db.db_size, db->db.db_size, &zp, dbuf_write_nofill_ready, NULL, NULL, dbuf_write_nofill_done, db, ZIO_PRIORITY_ASYNC_WRITE, ! ZIO_FLAG_MUSTSUCCEED | ZIO_FLAG_NODATA, &zb); } else { ASSERT(arc_released(data)); /* * For indirect blocks, we want to setup the children --- 3782,3792 ---- dr->dr_zio = zio_write(zio, os->os_spa, txg, &dr->dr_bp_copy, NULL, db->db.db_size, db->db.db_size, &zp, dbuf_write_nofill_ready, NULL, NULL, dbuf_write_nofill_done, db, ZIO_PRIORITY_ASYNC_WRITE, ! ZIO_FLAG_MUSTSUCCEED | ZIO_FLAG_NODATA, &zb, &sc); } else { ASSERT(arc_released(data)); /* * For indirect blocks, we want to setup the children
*** 3732,3739 **** dr->dr_zio = arc_write(zio, os->os_spa, txg, &dr->dr_bp_copy, data, DBUF_IS_L2CACHEABLE(db), &zp, dbuf_write_ready, children_ready_cb, dbuf_write_physdone, dbuf_write_done, db, ! ZIO_PRIORITY_ASYNC_WRITE, ZIO_FLAG_MUSTSUCCEED, &zb); } } --- 3799,3806 ---- dr->dr_zio = arc_write(zio, os->os_spa, txg, &dr->dr_bp_copy, data, DBUF_IS_L2CACHEABLE(db), &zp, dbuf_write_ready, children_ready_cb, dbuf_write_physdone, dbuf_write_done, db, ! ZIO_PRIORITY_ASYNC_WRITE, ZIO_FLAG_MUSTSUCCEED, &zb, &sc); } }