Print this page
    
MFV: illumos-gate@2aba3acda67326648fd60aaf2bfb4e18ee8c04ed
9816 Multi-TRB xhci transfers should use event data
9817 xhci needs to always set slot context
8550 increase xhci bulk transfer sgl count
9818 xhci_transfer_get_tdsize can return values that are too large
Reviewed by: Alex Wilson <alex.wilson@joyent.com>
Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Approved by: Joshua M. Clulow <josh@sysmgr.org>
Author: Robert Mustacchi <rm@joyent.com>
    
      
        | Split | 
	Close | 
      
      | Expand all | 
      | Collapse all | 
    
    
          --- old/usr/src/uts/common/io/usb/hcd/xhci/xhci_ring.c
          +++ new/usr/src/uts/common/io/usb/hcd/xhci/xhci_ring.c
   1    1  /*
   2    2   * This file and its contents are supplied under the terms of the
  
    | 
      ↓ open down ↓ | 
    2 lines elided | 
    
      ↑ open up ↑ | 
  
   3    3   * Common Development and Distribution License ("CDDL"), version 1.0.
   4    4   * You may only use this file in accordance with the terms of version
   5    5   * 1.0 of the CDDL.
   6    6   *
   7    7   * A full copy of the text of the CDDL should have accompanied this
   8    8   * source.  A copy of the CDDL is also available via the Internet at
   9    9   * http://www.illumos.org/license/CDDL.
  10   10   */
  11   11  
  12   12  /*
  13      - * Copyright 2016 Joyent, Inc.
       13 + * Copyright (c) 2018, Joyent, Inc.
  14   14   */
  15   15  
  16   16  /*
  17   17   * -----------------------------
  18   18   * xHCI Ring Management Routines
  19   19   * -----------------------------
  20   20   *
  21   21   * There are three major different types of rings for xHCI, these are:
  22   22   *
  23   23   * 1) Command Rings
  24   24   * 2) Event Rings
  25   25   * 3) Transfer Rings
  26   26   *
  27   27   * Command and Transfer rings function in similar ways while the event rings are
  28   28   * different. The difference comes in who is the consumer and who is the
  29   29   * producer. In the case of command and transfer rings, the driver is the
  30   30   * producer. For the event ring the driver is the consumer.
  31   31   *
  32   32   * Each ring in xhci has a synthetic head and tail register. Each entry in a
  33   33   * ring has a bit that's often referred to as the 'Cycle bit'. The cycle bit is
  34   34   * toggled as a means of saying that a given entry needs to be consumed.
  35   35   *
  36   36   * When a ring is created, all of the data in it is initialized to zero and the
  37   37   * producer and consumer agree that when the cycle bit is toggled, the ownership
  38   38   * of the entry is transfered from the producer to the consumer.  For example,
  39   39   * the command ring defaults to saying that a cycle bit of one is what indicates
  40   40   * the command is owned by the hardware. So as the driver (the producer) fills
  41   41   * in entries, the driver toggles the cycle bit from 0->1 as part of writing out
  42   42   * the TRB.  When the command ring's doorbell is rung, the hardware (the
  43   43   * consumer) begins processing commands. It will process them until one of two
  44   44   * things happens:
  45   45   *
  46   46   * 1) The hardware encounters an entry with the old cycle bit (0 in this case)
  47   47   *
  48   48   * 2) The hardware hits the last entry in the ring which is a special kind of
  49   49   * entry called a LINK TRB.
  50   50   *
  51   51   * A LINK TRB has two purposes:
  52   52   *
  53   53   * 1) Indicate where processing should be redirected. This can potentially be to
  54   54   * another memory segment; however, this driver always programs LINK TRBs to
  55   55   * point back to the start of the ring.
  56   56   *
  57   57   * 2) Indicate whether or not the cycle bit should be changed. We always
  58   58   * indicate that the cycle bit should be toggled when a LINK TRB is processed.
  59   59   *
  60   60   * In this same example, whereas the driver (the producer) would be setting the
  61   61   * cycle to 1 to indicate that an entry is to be processed, the driver would now
  62   62   * set it to 0. Similarly, the hardware (the consumer) would be looking for a
  63   63   * 0 to determine whether or not it should process the entry.
  64   64   *
  65   65   * Currently, when the driver allocates rings, it always allocates a single page
  66   66   * for the ring. The entire page is dedicated to ring use, which is determined
  67   67   * based on the devices PAGESIZE register. The last entry in a given page is
  68   68   * always configured as a LINK TRB. As each entry in a ring is 16 bytes, this
  69   69   * gives us an average of 255 usable descriptors on x86 and 511 on SPARC, as
  70   70   * PAGESIZE is 4k and 8k respectively.
  71   71   *
  72   72   * The driver is always the producer for all rings except for the event ring,
  73   73   * where it is the consumer.
  74   74   *
  75   75   * ----------------------
  76   76   * Head and Tail Pointers
  77   77   * ----------------------
  78   78   *
  79   79   * Now, while we have the cycle bits for the ring explained, we still need to
  80   80   * keep track of what we consider the head and tail pointers, what the xHCI
  81   81   * specification calls enqueue (head) and dequeue (tail) pointers. Now, in all
  82   82   * the cases here, the actual tracking of the head pointer is basically done by
  83   83   * the cycle bit; however, we maintain an actual offset in the xhci_ring_t
  84   84   * structure. The tail is usually less synthetic; however, it's up for different
  85   85   * folks to maintain it.
  86   86   *
  87   87   * We handle the command and transfer rings the same way. The head pointer
  88   88   * indicates where we should insert the next TRB to transfer. The tail pointer
  89   89   * indicates the last thing that hardware has told us it has processed. If the
  90   90   * head and tail point to the same index, then we know the ring is empty.
  91   91   *
  92   92   * We increment the head pointer whenever we insert an entry. Note that we do
  93   93   * not tell hardware about this in any way, it's just maintained by the cycle
  94   94   * bit. Then, we keep track of what hardware has processed in our tail pointer,
  95   95   * incrementing it only when we have an interrupt that indicates that it's been
  96   96   * processed.
  97   97   *
  98   98   * One oddity here is that we only get notified of this via the event ring. So
  99   99   * when the event ring encounters this information, it needs to go back and
 100  100   * increment our command and transfer ring tails after processing events.
 101  101   *
 102  102   * For the event ring, we handle things differently. We still initialize
 103  103   * everything to zero; however, we start processing things and looking at cycle
 104  104   * bits only when we get an interrupt from hardware. With the event ring, we do
 105  105   * *not* maintain a head pointer (it's still in the structure, but unused).  We
 106  106   * always start processing at the tail pointer and use the cycle bit to indicate
 107  107   * what we should process. Once we're done incrementing things, we go and notify
 108  108   * the hardware of how far we got with this process by updating the tail for the
 109  109   * event ring via a memory mapped register.
 110  110   */
 111  111  
 112  112  #include <sys/usb/hcd/xhci/xhci.h>
 113  113  
 114  114  void
 115  115  xhci_ring_free(xhci_ring_t *xrp)
 116  116  {
 117  117          if (xrp->xr_trb != NULL) {
 118  118                  xhci_dma_free(&xrp->xr_dma);
 119  119                  xrp->xr_trb = NULL;
 120  120          }
 121  121          xrp->xr_ntrb = 0;
 122  122          xrp->xr_head = 0;
 123  123          xrp->xr_tail = 0;
 124  124          xrp->xr_cycle = 0;
 125  125  }
 126  126  
 127  127  /*
 128  128   * Initialize a ring that hasn't been used and set up its link pointer back to
 129  129   * it.
 130  130   */
 131  131  int
 132  132  xhci_ring_reset(xhci_t *xhcip, xhci_ring_t *xrp)
 133  133  {
 134  134          xhci_trb_t *ltrb;
 135  135  
 136  136          ASSERT(xrp->xr_trb != NULL);
 137  137  
 138  138          bzero(xrp->xr_trb, sizeof (xhci_trb_t) * xrp->xr_ntrb);
 139  139          xrp->xr_head = 0;
 140  140          xrp->xr_tail = 0;
 141  141          xrp->xr_cycle = 1;
 142  142  
 143  143          /*
 144  144           * Set up the link TRB back to ourselves.
 145  145           */
 146  146          ltrb = &xrp->xr_trb[xrp->xr_ntrb - 1];
 147  147          ltrb->trb_addr = LE_64(xhci_dma_pa(&xrp->xr_dma));
 148  148          ltrb->trb_flags = LE_32(XHCI_TRB_TYPE_LINK | XHCI_TRB_LINKSEG);
 149  149  
 150  150          XHCI_DMA_SYNC(xrp->xr_dma, DDI_DMA_SYNC_FORDEV);
 151  151          if (xhci_check_dma_handle(xhcip, &xrp->xr_dma) != DDI_FM_OK) {
 152  152                  ddi_fm_service_impact(xhcip->xhci_dip, DDI_SERVICE_LOST);
 153  153                  return (EIO);
 154  154          }
 155  155  
 156  156          return (0);
 157  157  }
 158  158  
 159  159  int
 160  160  xhci_ring_alloc(xhci_t *xhcip, xhci_ring_t *xrp)
 161  161  {
 162  162          ddi_dma_attr_t attr;
 163  163          ddi_device_acc_attr_t acc;
 164  164  
 165  165          /*
 166  166           * We use a transfer attribute for the rings as they require 64-byte
 167  167           * boundaries.
 168  168           */
 169  169          xhci_dma_acc_attr(xhcip, &acc);
 170  170          xhci_dma_transfer_attr(xhcip, &attr, XHCI_DEF_DMA_SGL);
 171  171          bzero(xrp, sizeof (xhci_ring_t));
 172  172          if (xhci_dma_alloc(xhcip, &xrp->xr_dma, &attr, &acc, B_FALSE,
 173  173              xhcip->xhci_caps.xcap_pagesize, B_FALSE) == B_FALSE)
 174  174                  return (ENOMEM);
 175  175          xrp->xr_trb = (xhci_trb_t *)xrp->xr_dma.xdb_va;
 176  176          xrp->xr_ntrb = xhcip->xhci_caps.xcap_pagesize / sizeof (xhci_trb_t);
 177  177          return (0);
 178  178  }
 179  179  
 180  180  /*
 181  181   * Note, caller should have already synced our DMA memory. This should not be
 182  182   * used for the command ring, as its cycle is maintained by the cycling of the
 183  183   * head. This function is only used for managing the event ring.
 184  184   */
 185  185  xhci_trb_t *
 186  186  xhci_ring_event_advance(xhci_ring_t *xrp)
 187  187  {
 188  188          xhci_trb_t *trb = &xrp->xr_trb[xrp->xr_tail];
 189  189          VERIFY(xrp->xr_tail < xrp->xr_ntrb);
 190  190  
 191  191          if (xrp->xr_cycle != (LE_32(trb->trb_flags) & XHCI_TRB_CYCLE))
 192  192                  return (NULL);
 193  193  
 194  194          /*
 195  195           * The event ring does not use a link TRB. It instead always uses
 196  196           * information based on the table to wrap. That means that the last
 197  197           * entry is in fact going to contain data, so we shouldn't wrap and
 198  198           * toggle the cycle until after we've processed that, in other words the
 199  199           * tail equals the total number of entries.
 200  200           */
 201  201          xrp->xr_tail++;
 202  202          if (xrp->xr_tail == xrp->xr_ntrb) {
 203  203                  xrp->xr_cycle ^= 1;
 204  204                  xrp->xr_tail = 0;
 205  205          }
 206  206  
 207  207          return (trb);
 208  208  }
 209  209  
 210  210  /*
 211  211   * When processing the command ring, we're going to get a single event for each
 212  212   * entry in it. As we've submitted things in order, we need to make sure that
 213  213   * this address matches the DMA address that we'd expect of the current tail.
 214  214   */
 215  215  boolean_t
 216  216  xhci_ring_trb_tail_valid(xhci_ring_t *xrp, uint64_t dma)
 217  217  {
 218  218          uint64_t tail;
 219  219  
 220  220          tail = xhci_dma_pa(&xrp->xr_dma) + xrp->xr_tail * sizeof (xhci_trb_t);
 221  221          return (dma == tail);
 222  222  }
 223  223  
 224  224  /*
 225  225   * A variant on the above that checks for a given message within a range of
 226  226   * entries and returns the offset to it from the tail.
 227  227   */
 228  228  int
 229  229  xhci_ring_trb_valid_range(xhci_ring_t *xrp, uint64_t dma, uint_t range)
 230  230  {
 231  231          uint_t i;
 232  232          uint_t tail = xrp->xr_tail;
 233  233          uint64_t taddr;
 234  234  
 235  235          VERIFY(range < xrp->xr_ntrb);
 236  236          for (i = 0; i < range; i++) {
 237  237                  taddr = xhci_dma_pa(&xrp->xr_dma) + tail * sizeof (xhci_trb_t);
 238  238                  if (taddr == dma)
 239  239                          return (i);
 240  240  
 241  241                  tail++;
 242  242                  if (tail == xrp->xr_ntrb - 1)
 243  243                          tail = 0;
 244  244          }
 245  245  
 246  246          return (-1);
 247  247  }
 248  248  
 249  249  /*
 250  250   * Determine whether or not we have enough space for this request in a given
 251  251   * ring for the given request. Note, we have to be a bit careful here and ensure
 252  252   * that we properly handle cases where we cross the link TRB and that we don't
 253  253   * count it.
 254  254   *
 255  255   * To determine if we have enough space for a given number of trbs, we need to
 256  256   * logically advance the head pointer and make sure that we don't cross the tail
 257  257   * pointer. In other words, if after advancement, head == tail, we're in
 258  258   * trouble and don't have enough space.
 259  259   */
 260  260  boolean_t
 261  261  xhci_ring_trb_space(xhci_ring_t *xrp, uint_t ntrb)
 262  262  {
 263  263          uint_t i;
 264  264          uint_t head = xrp->xr_head;
 265  265  
 266  266          VERIFY(ntrb > 0);
 267  267          /* We use < to ignore the link TRB */
 268  268          VERIFY(ntrb < xrp->xr_ntrb);
 269  269  
 270  270          for (i = 0; i < ntrb; i++) {
 271  271                  head++;
 272  272                  if (head == xrp->xr_ntrb - 1) {
 273  273                          head = 0;
 274  274                  }
 275  275  
 276  276                  if (head == xrp->xr_tail)
 277  277                          return (B_FALSE);
 278  278          }
 279  279  
 280  280          return (B_TRUE);
  
    | 
      ↓ open down ↓ | 
    257 lines elided | 
    
      ↑ open up ↑ | 
  
 281  281  }
 282  282  
 283  283  /*
 284  284   * Fill in a TRB in the ring at offset trboff. If cycle is currently set to
 285  285   * B_TRUE, then we fill in the appropriate cycle bit to tell the system to
 286  286   * advance, otherwise we leave the existing cycle bit untouched so the system
 287  287   * doesn't accidentally advance until we have everything filled in.
 288  288   */
 289  289  void
 290  290  xhci_ring_trb_fill(xhci_ring_t *xrp, uint_t trboff, xhci_trb_t *host_trb,
 291      -    boolean_t put_cycle)
      291 +    uint64_t *trb_pap, boolean_t put_cycle)
 292  292  {
 293  293          uint_t i;
 294  294          uint32_t flags;
 295  295          uint_t ent = xrp->xr_head;
 296  296          uint8_t cycle = xrp->xr_cycle;
 297  297          xhci_trb_t *trb;
 298  298  
 299  299          for (i = 0; i < trboff; i++) {
 300  300                  ent++;
 301  301                  if (ent == xrp->xr_ntrb - 1) {
 302  302                          ent = 0;
 303  303                          cycle ^= 1;
 304  304                  }
 305  305          }
 306  306  
 307  307          /*
 308  308           * If we're being asked to not update the cycle for it to be valid to be
 309  309           * produced, we need to xor this once again to get to the inappropriate
 310  310           * value.
 311  311           */
 312  312          if (put_cycle == B_FALSE)
 313  313                  cycle ^= 1;
 314  314  
 315  315          trb = &xrp->xr_trb[ent];
 316  316  
  
    | 
      ↓ open down ↓ | 
    15 lines elided | 
    
      ↑ open up ↑ | 
  
 317  317          trb->trb_addr = host_trb->trb_addr;
 318  318          trb->trb_status = host_trb->trb_status;
 319  319          flags = host_trb->trb_flags;
 320  320          if (cycle == 0) {
 321  321                  flags &= ~LE_32(XHCI_TRB_CYCLE);
 322  322          } else {
 323  323                  flags |= LE_32(XHCI_TRB_CYCLE);
 324  324          }
 325  325  
 326  326          trb->trb_flags = flags;
      327 +
      328 +        if (trb_pap != NULL) {
      329 +                uint64_t pa;
      330 +
      331 +                /*
      332 +                 * This logic only works if we have a single cookie address.
      333 +                 * However, this is prettty tightly assumed for rings through
      334 +                 * the xhci driver at this time.
      335 +                 */
      336 +                ASSERT3U(xrp->xr_dma.xdb_ncookies, ==, 1);
      337 +                pa = xrp->xr_dma.xdb_cookies[0].dmac_laddress;
      338 +                pa += ((uintptr_t)trb - (uintptr_t)&xrp->xr_trb[0]);
      339 +                *trb_pap = pa;
      340 +        }
 327  341  }
 328  342  
 329  343  /*
 330  344   * Update our metadata for the ring and verify the cycle bit is correctly set
 331  345   * for the first trb. It is expected that it is incorrectly set.
 332  346   */
 333  347  void
 334  348  xhci_ring_trb_produce(xhci_ring_t *xrp, uint_t ntrb)
 335  349  {
 336  350          uint_t i, ohead;
 337  351          xhci_trb_t *trb;
 338  352  
 339  353          VERIFY(ntrb > 0);
 340  354  
 341  355          ohead = xrp->xr_head;
 342  356  
 343  357          /*
 344  358           * As part of updating the head, we need to make sure we correctly
 345  359           * update the cycle bit of the link TRB. So we always do this first
 346  360           * before we update the old head, to try and get a consistent view of
 347  361           * the cycle bit.
 348  362           */
 349  363          for (i = 0; i < ntrb; i++) {
 350  364                  xrp->xr_head++;
 351  365                  /*
 352  366                   * If we're updating the link TRB, we also need to make sure
 353  367                   * that the Chain bit is set if we're in the middle of a TD
 354  368                   * comprised of multiple TRDs. Thankfully the algorithmn here is
 355  369                   * simple: set it to the value of the previous TRB.
 356  370                   */
 357  371                  if (xrp->xr_head == xrp->xr_ntrb - 1) {
 358  372                          trb = &xrp->xr_trb[xrp->xr_ntrb - 1];
 359  373                          if (xrp->xr_trb[xrp->xr_ntrb - 2].trb_flags &
 360  374                              XHCI_TRB_CHAIN) {
 361  375                                  trb->trb_flags |= XHCI_TRB_CHAIN;
 362  376                          } else {
 363  377                                  trb->trb_flags &= ~XHCI_TRB_CHAIN;
 364  378  
 365  379                          }
 366  380                          trb->trb_flags ^= LE_32(XHCI_TRB_CYCLE);
 367  381                          xrp->xr_cycle ^= 1;
 368  382                          xrp->xr_head = 0;
 369  383                  }
 370  384          }
 371  385  
 372  386          trb = &xrp->xr_trb[ohead];
  
    | 
      ↓ open down ↓ | 
    36 lines elided | 
    
      ↑ open up ↑ | 
  
 373  387          trb->trb_flags ^= LE_32(XHCI_TRB_CYCLE);
 374  388  }
 375  389  
 376  390  /*
 377  391   * This is a convenience wrapper for the single TRB case to make callers less
 378  392   * likely to mess up some of the required semantics.
 379  393   */
 380  394  void
 381  395  xhci_ring_trb_put(xhci_ring_t *xrp, xhci_trb_t *trb)
 382  396  {
 383      -        xhci_ring_trb_fill(xrp, 0U, trb, B_FALSE);
      397 +        xhci_ring_trb_fill(xrp, 0U, trb, NULL, B_FALSE);
 384  398          xhci_ring_trb_produce(xrp, 1U);
 385  399  }
 386  400  
 387  401  /*
 388  402   * Update the tail pointer for a ring based on the DMA address of a consumed
 389  403   * entry. Note, this entry indicates what we just processed, therefore we should
 390  404   * bump the tail entry to the next one.
 391  405   */
 392  406  boolean_t
 393  407  xhci_ring_trb_consumed(xhci_ring_t *xrp, uint64_t dma)
 394  408  {
 395  409          uint64_t pa = xhci_dma_pa(&xrp->xr_dma);
 396  410          uint64_t high = pa + xrp->xr_ntrb * sizeof (xhci_trb_t);
 397  411  
 398  412          if (dma < pa || dma >= high ||
 399  413              dma % sizeof (xhci_trb_t) != 0)
 400  414                  return (B_FALSE);
 401  415  
 402  416          dma -= pa;
 403  417          dma /= sizeof (xhci_trb_t);
 404  418  
 405  419          VERIFY(dma < xrp->xr_ntrb);
 406  420  
 407  421          xrp->xr_tail = dma + 1;
 408  422          if (xrp->xr_tail == xrp->xr_ntrb - 1)
 409  423                  xrp->xr_tail = 0;
 410  424  
 411  425          return (B_TRUE);
 412  426  }
 413  427  
 414  428  /*
 415  429   * The ring represented here has been reset and we're being asked to basically
 416  430   * skip all outstanding entries. Note, this shouldn't be used for the event
 417  431   * ring. Because the cycle bit is toggled whenever the head moves past the link
 418  432   * trb, the cycle bit is already correct. So in this case, it's really just a
 419  433   * matter of setting the current tail equal to the head, at which point we
 420  434   * consider things empty.
 421  435   */
 422  436  void
 423  437  xhci_ring_skip(xhci_ring_t *xrp)
 424  438  {
 425  439          xrp->xr_tail = xrp->xr_head;
 426  440  }
 427  441  
 428  442  /*
 429  443   * A variant on the normal skip. This basically just tells us to make sure that
 430  444   * that everything this transfer represents has been skipped. Callers need to
 431  445   * make sure that this is actually the first transfer in the ring. Like above,
 432  446   * we don't need to touch the cycle bit.
 433  447   */
 434  448  void
 435  449  xhci_ring_skip_transfer(xhci_ring_t *xrp, xhci_transfer_t *xt)
 436  450  {
 437  451          uint_t i;
 438  452  
 439  453          for (i = 0; i < xt->xt_ntrbs; i++) {
 440  454                  xrp->xr_tail++;
 441  455                  if (xrp->xr_tail == xrp->xr_ntrb - 1)
 442  456                          xrp->xr_tail = 0;
 443  457          }
 444  458  }
  
    | 
      ↓ open down ↓ | 
    51 lines elided | 
    
      ↑ open up ↑ | 
  
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX