Print this page
8493 kmem_move taskq appears to be inducing significant system latency
Reviewed by: Robert Mustacchi <rm@joyent.com>
Reviewed by: Dan McDonald <danmcd@joyent.com>

Split Close
Expand all
Collapse all
          --- old/usr/src/uts/common/os/kmem.c
          +++ new/usr/src/uts/common/os/kmem.c
↓ open down ↓ 151 lines elided ↑ open up ↑
 152  152   *            with the new object (the unused copy destination). This response
 153  153   *            is the client's opportunity to be a model citizen and give back as
 154  154   *            much as it can.
 155  155   * DONT_KNOW: The client does not know about the object because
 156  156   *            a) the client has just allocated the object and not yet put it
 157  157   *               wherever it expects to find known objects
 158  158   *            b) the client has removed the object from wherever it expects to
 159  159   *               find known objects and is about to free it, or
 160  160   *            c) the client has freed the object.
 161  161   *            In all these cases (a, b, and c) kmem frees the new object (the
 162      - *            unused copy destination) and searches for the old object in the
 163      - *            magazine layer. If found, the object is removed from the magazine
 164      - *            layer and freed to the slab layer so it will no longer hold the
 165      - *            slab hostage.
      162 + *            unused copy destination).  In the first case, the object is in
      163 + *            use and the correct action is that for LATER; in the latter two
      164 + *            cases, we know that the object is either freed or about to be
      165 + *            freed, in which case it is either already in a magazine or about
      166 + *            to be in one.  In these cases, we know that the object will either
      167 + *            be reallocated and reused, or it will end up in a full magazine
      168 + *            that will be reaped (thereby liberating the slab).  Because it
      169 + *            is prohibitively expensive to differentiate these cases, and
      170 + *            because the defrag code is executed when we're low on memory
      171 + *            (thereby biasing the system to reclaim full magazines) we treat
      172 + *            all DONT_KNOW cases as LATER and rely on cache reaping to
      173 + *            generally clean up full magazines.  While we take the same action
      174 + *            for these cases, we maintain their semantic distinction:  if
      175 + *            defragmentation is not occurring, it is useful to know if this
      176 + *            is due to objects in use (LATER) or objects in an unknown state
      177 + *            of transition (DONT_KNOW).
 166  178   *
 167  179   * 2.3 Object States
 168  180   *
 169  181   * Neither kmem nor the client can be assumed to know the object's whereabouts
 170  182   * at the time of the callback. An object belonging to a kmem cache may be in
 171  183   * any of the following states:
 172  184   *
 173  185   * 1. Uninitialized on the slab
 174  186   * 2. Allocated from the slab but not constructed (still uninitialized)
 175  187   * 3. Allocated from the slab, constructed, but not yet ready for business
↓ open down ↓ 102 lines elided ↑ open up ↑
 278  290   *      object->o_container = (void *)((uintptr_t)object->o_container | 0x1);
 279  291   *      list_remove(&container->c_objects, object);
 280  292   *      mutex_exit(&container->c_objects_lock);
 281  293   *
 282  294   * In the common case, the object is freed to the magazine layer, where it may
 283  295   * be reused on a subsequent allocation without the overhead of calling the
 284  296   * constructor. While in the magazine it appears allocated from the point of
 285  297   * view of the slab layer, making it a candidate for the move callback. Most
 286  298   * objects unrecognized by the client in the move callback fall into this
 287  299   * category and are cheaply distinguished from known objects by the test
 288      - * described earlier. Since recognition is cheap for the client, and searching
 289      - * magazines is expensive for kmem, kmem defers searching until the client first
 290      - * returns KMEM_CBRC_DONT_KNOW. As long as the needed effort is reasonable, kmem
 291      - * elsewhere does what it can to avoid bothering the client unnecessarily.
      300 + * described earlier. Because searching magazines is prohibitively expensive
      301 + * for kmem, clients that do not mark freed objects (and therefore return
      302 + * KMEM_CBRC_DONT_KNOW for large numbers of objects) may find defragmentation
      303 + * efficacy reduced.
 292  304   *
 293  305   * Invalidating the designated pointer member before freeing the object marks
 294  306   * the object to be avoided in the callback, and conversely, assigning a valid
 295  307   * value to the designated pointer member after allocating the object makes the
 296  308   * object fair game for the callback:
 297  309   *
 298  310   *      ... allocate object ...
 299  311   *      ... set any initial state not set by the constructor ...
 300  312   *
 301  313   *      mutex_enter(&container->c_objects_lock);
↓ open down ↓ 730 lines elided ↑ open up ↑
1032 1044  static vmem_t           *kmem_cache_arena;
1033 1045  static vmem_t           *kmem_hash_arena;
1034 1046  static vmem_t           *kmem_log_arena;
1035 1047  static vmem_t           *kmem_oversize_arena;
1036 1048  static vmem_t           *kmem_va_arena;
1037 1049  static vmem_t           *kmem_default_arena;
1038 1050  static vmem_t           *kmem_firewall_va_arena;
1039 1051  static vmem_t           *kmem_firewall_arena;
1040 1052  
1041 1053  /*
1042      - * Define KMEM_STATS to turn on statistic gathering. By default, it is only
1043      - * turned on when DEBUG is also defined.
1044      - */
1045      -#ifdef  DEBUG
1046      -#define KMEM_STATS
1047      -#endif  /* DEBUG */
1048      -
1049      -#ifdef  KMEM_STATS
1050      -#define KMEM_STAT_ADD(stat)                     ((stat)++)
1051      -#define KMEM_STAT_COND_ADD(cond, stat)          ((void) (!(cond) || (stat)++))
1052      -#else
1053      -#define KMEM_STAT_ADD(stat)                     /* nothing */
1054      -#define KMEM_STAT_COND_ADD(cond, stat)          /* nothing */
1055      -#endif  /* KMEM_STATS */
1056      -
1057      -/*
1058 1054   * kmem slab consolidator thresholds (tunables)
1059 1055   */
1060 1056  size_t kmem_frag_minslabs = 101;        /* minimum total slabs */
1061 1057  size_t kmem_frag_numer = 1;             /* free buffers (numerator) */
1062 1058  size_t kmem_frag_denom = KMEM_VOID_FRACTION; /* buffers (denominator) */
1063 1059  /*
1064 1060   * Maximum number of slabs from which to move buffers during a single
1065 1061   * maintenance interval while the system is not low on memory.
1066 1062   */
1067 1063  size_t kmem_reclaim_max_slabs = 1;
1068 1064  /*
1069 1065   * Number of slabs to scan backwards from the end of the partial slab list
1070 1066   * when searching for buffers to relocate.
1071 1067   */
1072 1068  size_t kmem_reclaim_scan_range = 12;
1073 1069  
1074      -#ifdef  KMEM_STATS
1075      -static struct {
1076      -        uint64_t kms_callbacks;
1077      -        uint64_t kms_yes;
1078      -        uint64_t kms_no;
1079      -        uint64_t kms_later;
1080      -        uint64_t kms_dont_need;
1081      -        uint64_t kms_dont_know;
1082      -        uint64_t kms_hunt_found_mag;
1083      -        uint64_t kms_hunt_found_slab;
1084      -        uint64_t kms_hunt_alloc_fail;
1085      -        uint64_t kms_hunt_lucky;
1086      -        uint64_t kms_notify;
1087      -        uint64_t kms_notify_callbacks;
1088      -        uint64_t kms_disbelief;
1089      -        uint64_t kms_already_pending;
1090      -        uint64_t kms_callback_alloc_fail;
1091      -        uint64_t kms_callback_taskq_fail;
1092      -        uint64_t kms_endscan_slab_dead;
1093      -        uint64_t kms_endscan_slab_destroyed;
1094      -        uint64_t kms_endscan_nomem;
1095      -        uint64_t kms_endscan_refcnt_changed;
1096      -        uint64_t kms_endscan_nomove_changed;
1097      -        uint64_t kms_endscan_freelist;
1098      -        uint64_t kms_avl_update;
1099      -        uint64_t kms_avl_noupdate;
1100      -        uint64_t kms_no_longer_reclaimable;
1101      -        uint64_t kms_notify_no_longer_reclaimable;
1102      -        uint64_t kms_notify_slab_dead;
1103      -        uint64_t kms_notify_slab_destroyed;
1104      -        uint64_t kms_alloc_fail;
1105      -        uint64_t kms_constructor_fail;
1106      -        uint64_t kms_dead_slabs_freed;
1107      -        uint64_t kms_defrags;
1108      -        uint64_t kms_scans;
1109      -        uint64_t kms_scan_depot_ws_reaps;
1110      -        uint64_t kms_debug_reaps;
1111      -        uint64_t kms_debug_scans;
1112      -} kmem_move_stats;
1113      -#endif  /* KMEM_STATS */
1114      -
1115 1070  /* consolidator knobs */
1116 1071  boolean_t kmem_move_noreap;
1117 1072  boolean_t kmem_move_blocked;
1118 1073  boolean_t kmem_move_fulltilt;
1119 1074  boolean_t kmem_move_any_partial;
1120 1075  
1121 1076  #ifdef  DEBUG
1122 1077  /*
1123 1078   * kmem consolidator debug tunables:
1124 1079   * Ensure code coverage by occasionally running the consolidator even when the
↓ open down ↓ 790 lines elided ↑ open up ↑
1915 1870          }
1916 1871  
1917 1872          if (bcp->bc_next == NULL) {
1918 1873                  /* Transition the slab from completely allocated to partial. */
1919 1874                  ASSERT(sp->slab_refcnt == (sp->slab_chunks - 1));
1920 1875                  ASSERT(sp->slab_chunks > 1);
1921 1876                  list_remove(&cp->cache_complete_slabs, sp);
1922 1877                  cp->cache_complete_slab_count--;
1923 1878                  avl_add(&cp->cache_partial_slabs, sp);
1924 1879          } else {
1925      -#ifdef  DEBUG
1926      -                if (avl_update_gt(&cp->cache_partial_slabs, sp)) {
1927      -                        KMEM_STAT_ADD(kmem_move_stats.kms_avl_update);
1928      -                } else {
1929      -                        KMEM_STAT_ADD(kmem_move_stats.kms_avl_noupdate);
1930      -                }
1931      -#else
1932 1880                  (void) avl_update_gt(&cp->cache_partial_slabs, sp);
1933      -#endif
1934 1881          }
1935 1882  
1936 1883          ASSERT((cp->cache_slab_create - cp->cache_slab_destroy) ==
1937 1884              (cp->cache_complete_slab_count +
1938 1885              avl_numnodes(&cp->cache_partial_slabs) +
1939 1886              (cp->cache_defrag == NULL ? 0 : cp->cache_defrag->kmd_deadcount)));
1940 1887          mutex_exit(&cp->cache_lock);
1941 1888  }
1942 1889  
1943 1890  /*
↓ open down ↓ 1628 lines elided ↑ open up ↑
3572 3519          } else {
3573 3520                  int64_t reclaimable;
3574 3521  
3575 3522                  kmem_defrag_t *kd = cp->cache_defrag;
3576 3523                  kmcp->kmc_move_callbacks.value.ui64     = kd->kmd_callbacks;
3577 3524                  kmcp->kmc_move_yes.value.ui64           = kd->kmd_yes;
3578 3525                  kmcp->kmc_move_no.value.ui64            = kd->kmd_no;
3579 3526                  kmcp->kmc_move_later.value.ui64         = kd->kmd_later;
3580 3527                  kmcp->kmc_move_dont_need.value.ui64     = kd->kmd_dont_need;
3581 3528                  kmcp->kmc_move_dont_know.value.ui64     = kd->kmd_dont_know;
3582      -                kmcp->kmc_move_hunt_found.value.ui64    = kd->kmd_hunt_found;
     3529 +                kmcp->kmc_move_hunt_found.value.ui64    = 0;
3583 3530                  kmcp->kmc_move_slabs_freed.value.ui64   = kd->kmd_slabs_freed;
3584 3531                  kmcp->kmc_defrag.value.ui64             = kd->kmd_defrags;
3585 3532                  kmcp->kmc_scan.value.ui64               = kd->kmd_scans;
3586 3533  
3587 3534                  reclaimable = cp->cache_bufslab - (cp->cache_maxchunks - 1);
3588 3535                  reclaimable = MAX(reclaimable, 0);
3589 3536                  reclaimable += ((uint64_t)reap * cp->cache_magtype->mt_magsize);
3590 3537                  kmcp->kmc_move_reclaimable.value.ui64   = reclaimable;
3591 3538          }
3592 3539  
↓ open down ↓ 550 lines elided ↑ open up ↑
4143 4090           * Remove the cache from the global cache list so that no one else
4144 4091           * can schedule tasks on its behalf, wait for any pending tasks to
4145 4092           * complete, purge the cache, and then destroy it.
4146 4093           */
4147 4094          mutex_enter(&kmem_cache_lock);
4148 4095          list_remove(&kmem_caches, cp);
4149 4096          mutex_exit(&kmem_cache_lock);
4150 4097  
4151 4098          if (kmem_taskq != NULL)
4152 4099                  taskq_wait(kmem_taskq);
4153      -        if (kmem_move_taskq != NULL)
     4100 +
     4101 +        if (kmem_move_taskq != NULL && cp->cache_defrag != NULL)
4154 4102                  taskq_wait(kmem_move_taskq);
4155 4103  
4156 4104          kmem_cache_magazine_purge(cp);
4157 4105  
4158 4106          mutex_enter(&cp->cache_lock);
4159 4107          if (cp->cache_buftotal != 0)
4160 4108                  cmn_err(CE_WARN, "kmem_cache_destroy: '%s' (%p) not empty",
4161 4109                      cp->cache_name, (void *)cp);
4162 4110          if (cp->cache_defrag != NULL) {
4163 4111                  avl_destroy(&cp->cache_defrag->kmd_moves_pending);
↓ open down ↓ 506 lines elided ↑ open up ↑
4670 4618           * reclaimed until the cache as a whole is no longer fragmented.
4671 4619           *
4672 4620           *      sp->slab_refcnt   kmd_reclaim_numer
4673 4621           *      --------------- < ------------------
4674 4622           *      sp->slab_chunks   KMEM_VOID_FRACTION
4675 4623           */
4676 4624          return ((refcnt * KMEM_VOID_FRACTION) <
4677 4625              (sp->slab_chunks * cp->cache_defrag->kmd_reclaim_numer));
4678 4626  }
4679 4627  
4680      -static void *
4681      -kmem_hunt_mag(kmem_cache_t *cp, kmem_magazine_t *m, int n, void *buf,
4682      -    void *tbuf)
4683      -{
4684      -        int i;          /* magazine round index */
4685      -
4686      -        for (i = 0; i < n; i++) {
4687      -                if (buf == m->mag_round[i]) {
4688      -                        if (cp->cache_flags & KMF_BUFTAG) {
4689      -                                (void) kmem_cache_free_debug(cp, tbuf,
4690      -                                    caller());
4691      -                        }
4692      -                        m->mag_round[i] = tbuf;
4693      -                        return (buf);
4694      -                }
4695      -        }
4696      -
4697      -        return (NULL);
4698      -}
4699      -
4700 4628  /*
4701      - * Hunt the magazine layer for the given buffer. If found, the buffer is
4702      - * removed from the magazine layer and returned, otherwise NULL is returned.
4703      - * The state of the returned buffer is freed and constructed.
4704      - */
4705      -static void *
4706      -kmem_hunt_mags(kmem_cache_t *cp, void *buf)
4707      -{
4708      -        kmem_cpu_cache_t *ccp;
4709      -        kmem_magazine_t *m;
4710      -        int cpu_seqid;
4711      -        int n;          /* magazine rounds */
4712      -        void *tbuf;     /* temporary swap buffer */
4713      -
4714      -        ASSERT(MUTEX_NOT_HELD(&cp->cache_lock));
4715      -
4716      -        /*
4717      -         * Allocated a buffer to swap with the one we hope to pull out of a
4718      -         * magazine when found.
4719      -         */
4720      -        tbuf = kmem_cache_alloc(cp, KM_NOSLEEP);
4721      -        if (tbuf == NULL) {
4722      -                KMEM_STAT_ADD(kmem_move_stats.kms_hunt_alloc_fail);
4723      -                return (NULL);
4724      -        }
4725      -        if (tbuf == buf) {
4726      -                KMEM_STAT_ADD(kmem_move_stats.kms_hunt_lucky);
4727      -                if (cp->cache_flags & KMF_BUFTAG) {
4728      -                        (void) kmem_cache_free_debug(cp, buf, caller());
4729      -                }
4730      -                return (buf);
4731      -        }
4732      -
4733      -        /* Hunt the depot. */
4734      -        mutex_enter(&cp->cache_depot_lock);
4735      -        n = cp->cache_magtype->mt_magsize;
4736      -        for (m = cp->cache_full.ml_list; m != NULL; m = m->mag_next) {
4737      -                if (kmem_hunt_mag(cp, m, n, buf, tbuf) != NULL) {
4738      -                        mutex_exit(&cp->cache_depot_lock);
4739      -                        return (buf);
4740      -                }
4741      -        }
4742      -        mutex_exit(&cp->cache_depot_lock);
4743      -
4744      -        /* Hunt the per-CPU magazines. */
4745      -        for (cpu_seqid = 0; cpu_seqid < max_ncpus; cpu_seqid++) {
4746      -                ccp = &cp->cache_cpu[cpu_seqid];
4747      -
4748      -                mutex_enter(&ccp->cc_lock);
4749      -                m = ccp->cc_loaded;
4750      -                n = ccp->cc_rounds;
4751      -                if (kmem_hunt_mag(cp, m, n, buf, tbuf) != NULL) {
4752      -                        mutex_exit(&ccp->cc_lock);
4753      -                        return (buf);
4754      -                }
4755      -                m = ccp->cc_ploaded;
4756      -                n = ccp->cc_prounds;
4757      -                if (kmem_hunt_mag(cp, m, n, buf, tbuf) != NULL) {
4758      -                        mutex_exit(&ccp->cc_lock);
4759      -                        return (buf);
4760      -                }
4761      -                mutex_exit(&ccp->cc_lock);
4762      -        }
4763      -
4764      -        kmem_cache_free(cp, tbuf);
4765      -        return (NULL);
4766      -}
4767      -
4768      -/*
4769 4629   * May be called from the kmem_move_taskq, from kmem_cache_move_notify_task(),
4770 4630   * or when the buffer is freed.
4771 4631   */
4772 4632  static void
4773 4633  kmem_slab_move_yes(kmem_cache_t *cp, kmem_slab_t *sp, void *from_buf)
4774 4634  {
4775 4635          ASSERT(MUTEX_HELD(&cp->cache_lock));
4776 4636          ASSERT(KMEM_SLAB_MEMBER(sp, from_buf));
4777 4637  
4778 4638          if (!KMEM_SLAB_IS_PARTIAL(sp)) {
↓ open down ↓ 42 lines elided ↑ open up ↑
4821 4681   * guarantee the present whereabouts of the buffer to be moved, so it is up to
4822 4682   * the client to safely determine whether or not it is still using the buffer.
4823 4683   * The client must not free either of the buffers passed to the move callback,
4824 4684   * since kmem wants to free them directly to the slab layer. The client response
4825 4685   * tells kmem which of the two buffers to free:
4826 4686   *
4827 4687   * YES          kmem frees the old buffer (the move was successful)
4828 4688   * NO           kmem frees the new buffer, marks the slab of the old buffer
4829 4689   *              non-reclaimable to avoid bothering the client again
4830 4690   * LATER        kmem frees the new buffer, increments slab_later_count
4831      - * DONT_KNOW    kmem frees the new buffer, searches mags for the old buffer
     4691 + * DONT_KNOW    kmem frees the new buffer
4832 4692   * DONT_NEED    kmem frees both the old buffer and the new buffer
4833 4693   *
4834 4694   * The pending callback argument now being processed contains both of the
4835 4695   * buffers (old and new) passed to the move callback function, the slab of the
4836 4696   * old buffer, and flags related to the move request, such as whether or not the
4837 4697   * system was desperate for memory.
4838 4698   *
4839 4699   * Slabs are not freed while there is a pending callback, but instead are kept
4840 4700   * on a deadlist, which is drained after the last callback completes. This means
4841 4701   * that slabs are safe to access until kmem_move_end(), no matter how many of
↓ open down ↓ 13 lines elided ↑ open up ↑
4855 4715          ASSERT(MUTEX_NOT_HELD(&cp->cache_lock));
4856 4716          ASSERT(KMEM_SLAB_MEMBER(sp, callback->kmm_from_buf));
4857 4717  
4858 4718          /*
4859 4719           * The number of allocated buffers on the slab may have changed since we
4860 4720           * last checked the slab's reclaimability (when the pending move was
4861 4721           * enqueued), or the client may have responded NO when asked to move
4862 4722           * another buffer on the same slab.
4863 4723           */
4864 4724          if (!kmem_slab_is_reclaimable(cp, sp, callback->kmm_flags)) {
4865      -                KMEM_STAT_ADD(kmem_move_stats.kms_no_longer_reclaimable);
4866      -                KMEM_STAT_COND_ADD((callback->kmm_flags & KMM_NOTIFY),
4867      -                    kmem_move_stats.kms_notify_no_longer_reclaimable);
4868 4725                  kmem_slab_free(cp, callback->kmm_to_buf);
4869 4726                  kmem_move_end(cp, callback);
4870 4727                  return;
4871 4728          }
4872 4729  
4873 4730          /*
4874      -         * Hunting magazines is expensive, so we'll wait to do that until the
4875      -         * client responds KMEM_CBRC_DONT_KNOW. However, checking the slab layer
4876      -         * is cheap, so we might as well do that here in case we can avoid
4877      -         * bothering the client.
     4731 +         * Checking the slab layer is easy, so we might as well do that here
     4732 +         * in case we can avoid bothering the client.
4878 4733           */
4879 4734          mutex_enter(&cp->cache_lock);
4880 4735          free_on_slab = (kmem_slab_allocated(cp, sp,
4881 4736              callback->kmm_from_buf) == NULL);
4882 4737          mutex_exit(&cp->cache_lock);
4883 4738  
4884 4739          if (free_on_slab) {
4885      -                KMEM_STAT_ADD(kmem_move_stats.kms_hunt_found_slab);
4886 4740                  kmem_slab_free(cp, callback->kmm_to_buf);
4887 4741                  kmem_move_end(cp, callback);
4888 4742                  return;
4889 4743          }
4890 4744  
4891 4745          if (cp->cache_flags & KMF_BUFTAG) {
4892 4746                  /*
4893 4747                   * Make kmem_cache_alloc_debug() apply the constructor for us.
4894 4748                   */
4895 4749                  if (kmem_cache_alloc_debug(cp, callback->kmm_to_buf,
4896 4750                      KM_NOSLEEP, 1, caller()) != 0) {
4897      -                        KMEM_STAT_ADD(kmem_move_stats.kms_alloc_fail);
4898 4751                          kmem_move_end(cp, callback);
4899 4752                          return;
4900 4753                  }
4901 4754          } else if (cp->cache_constructor != NULL &&
4902 4755              cp->cache_constructor(callback->kmm_to_buf, cp->cache_private,
4903 4756              KM_NOSLEEP) != 0) {
4904 4757                  atomic_inc_64(&cp->cache_alloc_fail);
4905      -                KMEM_STAT_ADD(kmem_move_stats.kms_constructor_fail);
4906 4758                  kmem_slab_free(cp, callback->kmm_to_buf);
4907 4759                  kmem_move_end(cp, callback);
4908 4760                  return;
4909 4761          }
4910 4762  
4911      -        KMEM_STAT_ADD(kmem_move_stats.kms_callbacks);
4912      -        KMEM_STAT_COND_ADD((callback->kmm_flags & KMM_NOTIFY),
4913      -            kmem_move_stats.kms_notify_callbacks);
4914 4763          cp->cache_defrag->kmd_callbacks++;
4915 4764          cp->cache_defrag->kmd_thread = curthread;
4916 4765          cp->cache_defrag->kmd_from_buf = callback->kmm_from_buf;
4917 4766          cp->cache_defrag->kmd_to_buf = callback->kmm_to_buf;
4918 4767          DTRACE_PROBE2(kmem__move__start, kmem_cache_t *, cp, kmem_move_t *,
4919 4768              callback);
4920 4769  
4921 4770          response = cp->cache_move(callback->kmm_from_buf,
4922 4771              callback->kmm_to_buf, cp->cache_bufsize, cp->cache_private);
4923 4772  
4924 4773          DTRACE_PROBE3(kmem__move__end, kmem_cache_t *, cp, kmem_move_t *,
4925 4774              callback, kmem_cbrc_t, response);
4926 4775          cp->cache_defrag->kmd_thread = NULL;
4927 4776          cp->cache_defrag->kmd_from_buf = NULL;
4928 4777          cp->cache_defrag->kmd_to_buf = NULL;
4929 4778  
4930 4779          if (response == KMEM_CBRC_YES) {
4931      -                KMEM_STAT_ADD(kmem_move_stats.kms_yes);
4932 4780                  cp->cache_defrag->kmd_yes++;
4933 4781                  kmem_slab_free_constructed(cp, callback->kmm_from_buf, B_FALSE);
4934 4782                  /* slab safe to access until kmem_move_end() */
4935 4783                  if (sp->slab_refcnt == 0)
4936 4784                          cp->cache_defrag->kmd_slabs_freed++;
4937 4785                  mutex_enter(&cp->cache_lock);
4938 4786                  kmem_slab_move_yes(cp, sp, callback->kmm_from_buf);
4939 4787                  mutex_exit(&cp->cache_lock);
4940 4788                  kmem_move_end(cp, callback);
4941 4789                  return;
4942 4790          }
4943 4791  
4944 4792          switch (response) {
4945 4793          case KMEM_CBRC_NO:
4946      -                KMEM_STAT_ADD(kmem_move_stats.kms_no);
4947 4794                  cp->cache_defrag->kmd_no++;
4948 4795                  mutex_enter(&cp->cache_lock);
4949 4796                  kmem_slab_move_no(cp, sp, callback->kmm_from_buf);
4950 4797                  mutex_exit(&cp->cache_lock);
4951 4798                  break;
4952 4799          case KMEM_CBRC_LATER:
4953      -                KMEM_STAT_ADD(kmem_move_stats.kms_later);
4954 4800                  cp->cache_defrag->kmd_later++;
4955 4801                  mutex_enter(&cp->cache_lock);
4956 4802                  if (!KMEM_SLAB_IS_PARTIAL(sp)) {
4957 4803                          mutex_exit(&cp->cache_lock);
4958 4804                          break;
4959 4805                  }
4960 4806  
4961 4807                  if (++sp->slab_later_count >= KMEM_DISBELIEF) {
4962      -                        KMEM_STAT_ADD(kmem_move_stats.kms_disbelief);
4963 4808                          kmem_slab_move_no(cp, sp, callback->kmm_from_buf);
4964 4809                  } else if (!(sp->slab_flags & KMEM_SLAB_NOMOVE)) {
4965 4810                          sp->slab_stuck_offset = KMEM_SLAB_OFFSET(sp,
4966 4811                              callback->kmm_from_buf);
4967 4812                  }
4968 4813                  mutex_exit(&cp->cache_lock);
4969 4814                  break;
4970 4815          case KMEM_CBRC_DONT_NEED:
4971      -                KMEM_STAT_ADD(kmem_move_stats.kms_dont_need);
4972 4816                  cp->cache_defrag->kmd_dont_need++;
4973 4817                  kmem_slab_free_constructed(cp, callback->kmm_from_buf, B_FALSE);
4974 4818                  if (sp->slab_refcnt == 0)
4975 4819                          cp->cache_defrag->kmd_slabs_freed++;
4976 4820                  mutex_enter(&cp->cache_lock);
4977 4821                  kmem_slab_move_yes(cp, sp, callback->kmm_from_buf);
4978 4822                  mutex_exit(&cp->cache_lock);
4979 4823                  break;
4980 4824          case KMEM_CBRC_DONT_KNOW:
4981      -                KMEM_STAT_ADD(kmem_move_stats.kms_dont_know);
     4825 +                /*
     4826 +                 * If we don't know if we can move this buffer or not, we'll
     4827 +                 * just assume that we can't:  if the buffer is in fact free,
     4828 +                 * then it is sitting in one of the per-CPU magazines or in
     4829 +                 * a full magazine in the depot layer.  Either way, because
     4830 +                 * defrag is induced in the same logic that reaps a cache,
     4831 +                 * it's likely that full magazines will be returned to the
     4832 +                 * system soon (thereby accomplishing what we're trying to
     4833 +                 * accomplish here: return those magazines to their slabs).
     4834 +                 * Given this, any work that we might do now to locate a buffer
     4835 +                 * in a magazine is wasted (and expensive!) work; we bump
     4836 +                 * a counter in this case and otherwise assume that we can't
     4837 +                 * move it.
     4838 +                 */
4982 4839                  cp->cache_defrag->kmd_dont_know++;
4983      -                if (kmem_hunt_mags(cp, callback->kmm_from_buf) != NULL) {
4984      -                        KMEM_STAT_ADD(kmem_move_stats.kms_hunt_found_mag);
4985      -                        cp->cache_defrag->kmd_hunt_found++;
4986      -                        kmem_slab_free_constructed(cp, callback->kmm_from_buf,
4987      -                            B_TRUE);
4988      -                        if (sp->slab_refcnt == 0)
4989      -                                cp->cache_defrag->kmd_slabs_freed++;
4990      -                        mutex_enter(&cp->cache_lock);
4991      -                        kmem_slab_move_yes(cp, sp, callback->kmm_from_buf);
4992      -                        mutex_exit(&cp->cache_lock);
4993      -                }
4994 4840                  break;
4995 4841          default:
4996 4842                  panic("'%s' (%p) unexpected move callback response %d\n",
4997 4843                      cp->cache_name, (void *)cp, response);
4998 4844          }
4999 4845  
5000 4846          kmem_slab_free_constructed(cp, callback->kmm_to_buf, B_FALSE);
5001 4847          kmem_move_end(cp, callback);
5002 4848  }
5003 4849  
↓ open down ↓ 4 lines elided ↑ open up ↑
5008 4854          void *to_buf;
5009 4855          avl_index_t index;
5010 4856          kmem_move_t *callback, *pending;
5011 4857          ulong_t n;
5012 4858  
5013 4859          ASSERT(taskq_member(kmem_taskq, curthread));
5014 4860          ASSERT(MUTEX_NOT_HELD(&cp->cache_lock));
5015 4861          ASSERT(sp->slab_flags & KMEM_SLAB_MOVE_PENDING);
5016 4862  
5017 4863          callback = kmem_cache_alloc(kmem_move_cache, KM_NOSLEEP);
5018      -        if (callback == NULL) {
5019      -                KMEM_STAT_ADD(kmem_move_stats.kms_callback_alloc_fail);
     4864 +
     4865 +        if (callback == NULL)
5020 4866                  return (B_FALSE);
5021      -        }
5022 4867  
5023 4868          callback->kmm_from_slab = sp;
5024 4869          callback->kmm_from_buf = buf;
5025 4870          callback->kmm_flags = flags;
5026 4871  
5027 4872          mutex_enter(&cp->cache_lock);
5028 4873  
5029 4874          n = avl_numnodes(&cp->cache_partial_slabs);
5030 4875          if ((n == 0) || ((n == 1) && !(flags & KMM_DEBUG))) {
5031 4876                  mutex_exit(&cp->cache_lock);
↓ open down ↓ 4 lines elided ↑ open up ↑
5036 4881          pending = avl_find(&cp->cache_defrag->kmd_moves_pending, buf, &index);
5037 4882          if (pending != NULL) {
5038 4883                  /*
5039 4884                   * If the move is already pending and we're desperate now,
5040 4885                   * update the move flags.
5041 4886                   */
5042 4887                  if (flags & KMM_DESPERATE) {
5043 4888                          pending->kmm_flags |= KMM_DESPERATE;
5044 4889                  }
5045 4890                  mutex_exit(&cp->cache_lock);
5046      -                KMEM_STAT_ADD(kmem_move_stats.kms_already_pending);
5047 4891                  kmem_cache_free(kmem_move_cache, callback);
5048 4892                  return (B_TRUE);
5049 4893          }
5050 4894  
5051 4895          to_buf = kmem_slab_alloc_impl(cp, avl_first(&cp->cache_partial_slabs),
5052 4896              B_FALSE);
5053 4897          callback->kmm_to_buf = to_buf;
5054 4898          avl_insert(&cp->cache_defrag->kmd_moves_pending, callback, index);
5055 4899  
5056 4900          mutex_exit(&cp->cache_lock);
5057 4901  
5058 4902          if (!taskq_dispatch(kmem_move_taskq, (task_func_t *)kmem_move_buffer,
5059 4903              callback, TQ_NOSLEEP)) {
5060      -                KMEM_STAT_ADD(kmem_move_stats.kms_callback_taskq_fail);
5061 4904                  mutex_enter(&cp->cache_lock);
5062 4905                  avl_remove(&cp->cache_defrag->kmd_moves_pending, callback);
5063 4906                  mutex_exit(&cp->cache_lock);
5064 4907                  kmem_slab_free(cp, to_buf);
5065 4908                  kmem_cache_free(kmem_move_cache, callback);
5066 4909                  return (B_FALSE);
5067 4910          }
5068 4911  
5069 4912          return (B_TRUE);
5070 4913  }
↓ open down ↓ 25 lines elided ↑ open up ↑
5096 4939                   */
5097 4940                  while ((sp = list_remove_head(deadlist)) != NULL) {
5098 4941                          if (sp->slab_flags & KMEM_SLAB_MOVE_PENDING) {
5099 4942                                  list_insert_tail(deadlist, sp);
5100 4943                                  break;
5101 4944                          }
5102 4945                          cp->cache_defrag->kmd_deadcount--;
5103 4946                          cp->cache_slab_destroy++;
5104 4947                          mutex_exit(&cp->cache_lock);
5105 4948                          kmem_slab_destroy(cp, sp);
5106      -                        KMEM_STAT_ADD(kmem_move_stats.kms_dead_slabs_freed);
5107 4949                          mutex_enter(&cp->cache_lock);
5108 4950                  }
5109 4951          }
5110 4952          mutex_exit(&cp->cache_lock);
5111 4953          kmem_cache_free(kmem_move_cache, callback);
5112 4954  }
5113 4955  
5114 4956  /*
5115 4957   * Move buffers from least used slabs first by scanning backwards from the end
5116 4958   * of the partial slab list. Scan at most max_scan candidate slabs and move
↓ open down ↓ 124 lines elided ↑ open up ↑
5241 5083                                           * context where that is determined
5242 5084                                           * requires the slab to exist.
5243 5085                                           * Fortunately, a pending move also
5244 5086                                           * means we don't need to destroy the
5245 5087                                           * slab here, since it will get
5246 5088                                           * destroyed along with any other slabs
5247 5089                                           * on the deadlist after the last
5248 5090                                           * pending move completes.
5249 5091                                           */
5250 5092                                          list_insert_head(deadlist, sp);
5251      -                                        KMEM_STAT_ADD(kmem_move_stats.
5252      -                                            kms_endscan_slab_dead);
5253 5093                                          return (-1);
5254 5094                                  }
5255 5095  
5256 5096                                  /*
5257 5097                                   * Destroy the slab now if it was completely
5258 5098                                   * freed while we dropped cache_lock and there
5259 5099                                   * are no pending moves. Since slab_refcnt
5260 5100                                   * cannot change once it reaches zero, no new
5261 5101                                   * pending moves from that slab are possible.
5262 5102                                   */
5263 5103                                  cp->cache_defrag->kmd_deadcount--;
5264 5104                                  cp->cache_slab_destroy++;
5265 5105                                  mutex_exit(&cp->cache_lock);
5266 5106                                  kmem_slab_destroy(cp, sp);
5267      -                                KMEM_STAT_ADD(kmem_move_stats.
5268      -                                    kms_dead_slabs_freed);
5269      -                                KMEM_STAT_ADD(kmem_move_stats.
5270      -                                    kms_endscan_slab_destroyed);
5271 5107                                  mutex_enter(&cp->cache_lock);
5272 5108                                  /*
5273 5109                                   * Since we can't pick up the scan where we left
5274 5110                                   * off, abort the scan and say nothing about the
5275 5111                                   * number of reclaimable slabs.
5276 5112                                   */
5277 5113                                  return (-1);
5278 5114                          }
5279 5115  
5280 5116                          if (!success) {
5281 5117                                  /*
5282 5118                                   * Abort the scan if there is not enough memory
5283 5119                                   * for the request and say nothing about the
5284 5120                                   * number of reclaimable slabs.
5285 5121                                   */
5286      -                                KMEM_STAT_COND_ADD(s < max_slabs,
5287      -                                    kmem_move_stats.kms_endscan_nomem);
5288 5122                                  return (-1);
5289 5123                          }
5290 5124  
5291 5125                          /*
5292 5126                           * The slab's position changed while the lock was
5293 5127                           * dropped, so we don't know where we are in the
5294 5128                           * sequence any more.
5295 5129                           */
5296 5130                          if (sp->slab_refcnt != refcnt) {
5297 5131                                  /*
5298 5132                                   * If this is a KMM_DEBUG move, the slab_refcnt
5299 5133                                   * may have changed because we allocated a
5300 5134                                   * destination buffer on the same slab. In that
5301 5135                                   * case, we're not interested in counting it.
5302 5136                                   */
5303      -                                KMEM_STAT_COND_ADD(!(flags & KMM_DEBUG) &&
5304      -                                    (s < max_slabs),
5305      -                                    kmem_move_stats.kms_endscan_refcnt_changed);
5306 5137                                  return (-1);
5307 5138                          }
5308      -                        if ((sp->slab_flags & KMEM_SLAB_NOMOVE) != nomove) {
5309      -                                KMEM_STAT_COND_ADD(s < max_slabs,
5310      -                                    kmem_move_stats.kms_endscan_nomove_changed);
     5139 +                        if ((sp->slab_flags & KMEM_SLAB_NOMOVE) != nomove)
5311 5140                                  return (-1);
5312      -                        }
5313 5141  
5314 5142                          /*
5315 5143                           * Generating a move request allocates a destination
5316 5144                           * buffer from the slab layer, bumping the first partial
5317 5145                           * slab if it is completely allocated. If the current
5318 5146                           * slab becomes the first partial slab as a result, we
5319 5147                           * can't continue to scan backwards.
5320 5148                           *
5321 5149                           * If this is a KMM_DEBUG move and we allocated the
5322 5150                           * destination buffer from the last partial slab, then
↓ open down ↓ 6 lines elided ↑ open up ↑
5329 5157                                  /*
5330 5158                                   * We're not interested in a second KMM_DEBUG
5331 5159                                   * move.
5332 5160                                   */
5333 5161                                  goto end_scan;
5334 5162                          }
5335 5163                  }
5336 5164          }
5337 5165  end_scan:
5338 5166  
5339      -        KMEM_STAT_COND_ADD(!(flags & KMM_DEBUG) &&
5340      -            (s < max_slabs) &&
5341      -            (sp == avl_first(&cp->cache_partial_slabs)),
5342      -            kmem_move_stats.kms_endscan_freelist);
5343      -
5344 5167          return (s);
5345 5168  }
5346 5169  
5347 5170  typedef struct kmem_move_notify_args {
5348 5171          kmem_cache_t *kmna_cache;
5349 5172          void *kmna_buf;
5350 5173  } kmem_move_notify_args_t;
5351 5174  
5352 5175  static void
5353 5176  kmem_cache_move_notify_task(void *arg)
↓ open down ↓ 39 lines elided ↑ open up ↑
5393 5216                  ASSERT(sp->slab_flags & KMEM_SLAB_MOVE_PENDING);
5394 5217                  sp->slab_flags &= ~KMEM_SLAB_MOVE_PENDING;
5395 5218                  if (sp->slab_refcnt == 0) {
5396 5219                          list_t *deadlist = &cp->cache_defrag->kmd_deadlist;
5397 5220                          list_remove(deadlist, sp);
5398 5221  
5399 5222                          if (!avl_is_empty(
5400 5223                              &cp->cache_defrag->kmd_moves_pending)) {
5401 5224                                  list_insert_head(deadlist, sp);
5402 5225                                  mutex_exit(&cp->cache_lock);
5403      -                                KMEM_STAT_ADD(kmem_move_stats.
5404      -                                    kms_notify_slab_dead);
5405 5226                                  return;
5406 5227                          }
5407 5228  
5408 5229                          cp->cache_defrag->kmd_deadcount--;
5409 5230                          cp->cache_slab_destroy++;
5410 5231                          mutex_exit(&cp->cache_lock);
5411 5232                          kmem_slab_destroy(cp, sp);
5412      -                        KMEM_STAT_ADD(kmem_move_stats.kms_dead_slabs_freed);
5413      -                        KMEM_STAT_ADD(kmem_move_stats.
5414      -                            kms_notify_slab_destroyed);
5415 5233                          return;
5416 5234                  }
5417 5235          } else {
5418 5236                  kmem_slab_move_yes(cp, sp, buf);
5419 5237          }
5420 5238          mutex_exit(&cp->cache_lock);
5421 5239  }
5422 5240  
5423 5241  void
5424 5242  kmem_cache_move_notify(kmem_cache_t *cp, void *buf)
5425 5243  {
5426 5244          kmem_move_notify_args_t *args;
5427 5245  
5428      -        KMEM_STAT_ADD(kmem_move_stats.kms_notify);
5429 5246          args = kmem_alloc(sizeof (kmem_move_notify_args_t), KM_NOSLEEP);
5430 5247          if (args != NULL) {
5431 5248                  args->kmna_cache = cp;
5432 5249                  args->kmna_buf = buf;
5433 5250                  if (!taskq_dispatch(kmem_taskq,
5434 5251                      (task_func_t *)kmem_cache_move_notify_task, args,
5435 5252                      TQ_NOSLEEP))
5436 5253                          kmem_free(args, sizeof (kmem_move_notify_args_t));
5437 5254          }
5438 5255  }
↓ open down ↓ 2 lines elided ↑ open up ↑
5441 5258  kmem_cache_defrag(kmem_cache_t *cp)
5442 5259  {
5443 5260          size_t n;
5444 5261  
5445 5262          ASSERT(cp->cache_defrag != NULL);
5446 5263  
5447 5264          mutex_enter(&cp->cache_lock);
5448 5265          n = avl_numnodes(&cp->cache_partial_slabs);
5449 5266          if (n > 1) {
5450 5267                  /* kmem_move_buffers() drops and reacquires cache_lock */
5451      -                KMEM_STAT_ADD(kmem_move_stats.kms_defrags);
5452 5268                  cp->cache_defrag->kmd_defrags++;
5453 5269                  (void) kmem_move_buffers(cp, n, 0, KMM_DESPERATE);
5454 5270          }
5455 5271          mutex_exit(&cp->cache_lock);
5456 5272  }
5457 5273  
5458 5274  /* Is this cache above the fragmentation threshold? */
5459 5275  static boolean_t
5460 5276  kmem_cache_frag_threshold(kmem_cache_t *cp, uint64_t nfree)
5461 5277  {
↓ open down ↓ 78 lines elided ↑ open up ↑
5540 5356                  /*
5541 5357                   * Consolidate reclaimable slabs from the end of the partial
5542 5358                   * slab list (scan at most kmem_reclaim_scan_range slabs to find
5543 5359                   * reclaimable slabs). Keep track of how many candidate slabs we
5544 5360                   * looked for and how many we actually found so we can adjust
5545 5361                   * the definition of a candidate slab if we're having trouble
5546 5362                   * finding them.
5547 5363                   *
5548 5364                   * kmem_move_buffers() drops and reacquires cache_lock.
5549 5365                   */
5550      -                KMEM_STAT_ADD(kmem_move_stats.kms_scans);
5551 5366                  kmd->kmd_scans++;
5552 5367                  slabs_found = kmem_move_buffers(cp, kmem_reclaim_scan_range,
5553 5368                      kmem_reclaim_max_slabs, 0);
5554 5369                  if (slabs_found >= 0) {
5555 5370                          kmd->kmd_slabs_sought += kmem_reclaim_max_slabs;
5556 5371                          kmd->kmd_slabs_found += slabs_found;
5557 5372                  }
5558 5373  
5559 5374                  if (++kmd->kmd_tries >= kmem_reclaim_scan_range) {
5560 5375                          kmd->kmd_tries = 0;
↓ open down ↓ 20 lines elided ↑ open up ↑
5581 5396                           * In a debug kernel we want the consolidator to
5582 5397                           * run occasionally even when there is plenty of
5583 5398                           * memory.
5584 5399                           */
5585 5400                          uint16_t debug_rand;
5586 5401  
5587 5402                          (void) random_get_bytes((uint8_t *)&debug_rand, 2);
5588 5403                          if (!kmem_move_noreap &&
5589 5404                              ((debug_rand % kmem_mtb_reap) == 0)) {
5590 5405                                  mutex_exit(&cp->cache_lock);
5591      -                                KMEM_STAT_ADD(kmem_move_stats.kms_debug_reaps);
5592 5406                                  kmem_cache_reap(cp);
5593 5407                                  return;
5594 5408                          } else if ((debug_rand % kmem_mtb_move) == 0) {
5595      -                                KMEM_STAT_ADD(kmem_move_stats.kms_scans);
5596      -                                KMEM_STAT_ADD(kmem_move_stats.kms_debug_scans);
5597 5409                                  kmd->kmd_scans++;
5598 5410                                  (void) kmem_move_buffers(cp,
5599 5411                                      kmem_reclaim_scan_range, 1, KMM_DEBUG);
5600 5412                          }
5601 5413                  }
5602 5414  #endif  /* DEBUG */
5603 5415          }
5604 5416  
5605 5417          mutex_exit(&cp->cache_lock);
5606 5418  
5607      -        if (reap) {
5608      -                KMEM_STAT_ADD(kmem_move_stats.kms_scan_depot_ws_reaps);
     5419 +        if (reap)
5609 5420                  kmem_depot_ws_reap(cp);
5610      -        }
5611 5421  }
    
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX