Print this page
MFV: illumos-gate@2aba3acda67326648fd60aaf2bfb4e18ee8c04ed
9816 Multi-TRB xhci transfers should use event data
9817 xhci needs to always set slot context
8550 increase xhci bulk transfer sgl count
9818 xhci_transfer_get_tdsize can return values that are too large
Reviewed by: Alex Wilson <alex.wilson@joyent.com>
Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Approved by: Joshua M. Clulow <josh@sysmgr.org>
Author: Robert Mustacchi <rm@joyent.com>
| Split |
Close |
| Expand all |
| Collapse all |
--- old/usr/src/uts/common/io/usb/hcd/xhci/xhci_ring.c
+++ new/usr/src/uts/common/io/usb/hcd/xhci/xhci_ring.c
1 1 /*
2 2 * This file and its contents are supplied under the terms of the
|
↓ open down ↓ |
2 lines elided |
↑ open up ↑ |
3 3 * Common Development and Distribution License ("CDDL"), version 1.0.
4 4 * You may only use this file in accordance with the terms of version
5 5 * 1.0 of the CDDL.
6 6 *
7 7 * A full copy of the text of the CDDL should have accompanied this
8 8 * source. A copy of the CDDL is also available via the Internet at
9 9 * http://www.illumos.org/license/CDDL.
10 10 */
11 11
12 12 /*
13 - * Copyright 2016 Joyent, Inc.
13 + * Copyright (c) 2018, Joyent, Inc.
14 14 */
15 15
16 16 /*
17 17 * -----------------------------
18 18 * xHCI Ring Management Routines
19 19 * -----------------------------
20 20 *
21 21 * There are three major different types of rings for xHCI, these are:
22 22 *
23 23 * 1) Command Rings
24 24 * 2) Event Rings
25 25 * 3) Transfer Rings
26 26 *
27 27 * Command and Transfer rings function in similar ways while the event rings are
28 28 * different. The difference comes in who is the consumer and who is the
29 29 * producer. In the case of command and transfer rings, the driver is the
30 30 * producer. For the event ring the driver is the consumer.
31 31 *
32 32 * Each ring in xhci has a synthetic head and tail register. Each entry in a
33 33 * ring has a bit that's often referred to as the 'Cycle bit'. The cycle bit is
34 34 * toggled as a means of saying that a given entry needs to be consumed.
35 35 *
36 36 * When a ring is created, all of the data in it is initialized to zero and the
37 37 * producer and consumer agree that when the cycle bit is toggled, the ownership
38 38 * of the entry is transfered from the producer to the consumer. For example,
39 39 * the command ring defaults to saying that a cycle bit of one is what indicates
40 40 * the command is owned by the hardware. So as the driver (the producer) fills
41 41 * in entries, the driver toggles the cycle bit from 0->1 as part of writing out
42 42 * the TRB. When the command ring's doorbell is rung, the hardware (the
43 43 * consumer) begins processing commands. It will process them until one of two
44 44 * things happens:
45 45 *
46 46 * 1) The hardware encounters an entry with the old cycle bit (0 in this case)
47 47 *
48 48 * 2) The hardware hits the last entry in the ring which is a special kind of
49 49 * entry called a LINK TRB.
50 50 *
51 51 * A LINK TRB has two purposes:
52 52 *
53 53 * 1) Indicate where processing should be redirected. This can potentially be to
54 54 * another memory segment; however, this driver always programs LINK TRBs to
55 55 * point back to the start of the ring.
56 56 *
57 57 * 2) Indicate whether or not the cycle bit should be changed. We always
58 58 * indicate that the cycle bit should be toggled when a LINK TRB is processed.
59 59 *
60 60 * In this same example, whereas the driver (the producer) would be setting the
61 61 * cycle to 1 to indicate that an entry is to be processed, the driver would now
62 62 * set it to 0. Similarly, the hardware (the consumer) would be looking for a
63 63 * 0 to determine whether or not it should process the entry.
64 64 *
65 65 * Currently, when the driver allocates rings, it always allocates a single page
66 66 * for the ring. The entire page is dedicated to ring use, which is determined
67 67 * based on the devices PAGESIZE register. The last entry in a given page is
68 68 * always configured as a LINK TRB. As each entry in a ring is 16 bytes, this
69 69 * gives us an average of 255 usable descriptors on x86 and 511 on SPARC, as
70 70 * PAGESIZE is 4k and 8k respectively.
71 71 *
72 72 * The driver is always the producer for all rings except for the event ring,
73 73 * where it is the consumer.
74 74 *
75 75 * ----------------------
76 76 * Head and Tail Pointers
77 77 * ----------------------
78 78 *
79 79 * Now, while we have the cycle bits for the ring explained, we still need to
80 80 * keep track of what we consider the head and tail pointers, what the xHCI
81 81 * specification calls enqueue (head) and dequeue (tail) pointers. Now, in all
82 82 * the cases here, the actual tracking of the head pointer is basically done by
83 83 * the cycle bit; however, we maintain an actual offset in the xhci_ring_t
84 84 * structure. The tail is usually less synthetic; however, it's up for different
85 85 * folks to maintain it.
86 86 *
87 87 * We handle the command and transfer rings the same way. The head pointer
88 88 * indicates where we should insert the next TRB to transfer. The tail pointer
89 89 * indicates the last thing that hardware has told us it has processed. If the
90 90 * head and tail point to the same index, then we know the ring is empty.
91 91 *
92 92 * We increment the head pointer whenever we insert an entry. Note that we do
93 93 * not tell hardware about this in any way, it's just maintained by the cycle
94 94 * bit. Then, we keep track of what hardware has processed in our tail pointer,
95 95 * incrementing it only when we have an interrupt that indicates that it's been
96 96 * processed.
97 97 *
98 98 * One oddity here is that we only get notified of this via the event ring. So
99 99 * when the event ring encounters this information, it needs to go back and
100 100 * increment our command and transfer ring tails after processing events.
101 101 *
102 102 * For the event ring, we handle things differently. We still initialize
103 103 * everything to zero; however, we start processing things and looking at cycle
104 104 * bits only when we get an interrupt from hardware. With the event ring, we do
105 105 * *not* maintain a head pointer (it's still in the structure, but unused). We
106 106 * always start processing at the tail pointer and use the cycle bit to indicate
107 107 * what we should process. Once we're done incrementing things, we go and notify
108 108 * the hardware of how far we got with this process by updating the tail for the
109 109 * event ring via a memory mapped register.
110 110 */
111 111
112 112 #include <sys/usb/hcd/xhci/xhci.h>
113 113
114 114 void
115 115 xhci_ring_free(xhci_ring_t *xrp)
116 116 {
117 117 if (xrp->xr_trb != NULL) {
118 118 xhci_dma_free(&xrp->xr_dma);
119 119 xrp->xr_trb = NULL;
120 120 }
121 121 xrp->xr_ntrb = 0;
122 122 xrp->xr_head = 0;
123 123 xrp->xr_tail = 0;
124 124 xrp->xr_cycle = 0;
125 125 }
126 126
127 127 /*
128 128 * Initialize a ring that hasn't been used and set up its link pointer back to
129 129 * it.
130 130 */
131 131 int
132 132 xhci_ring_reset(xhci_t *xhcip, xhci_ring_t *xrp)
133 133 {
134 134 xhci_trb_t *ltrb;
135 135
136 136 ASSERT(xrp->xr_trb != NULL);
137 137
138 138 bzero(xrp->xr_trb, sizeof (xhci_trb_t) * xrp->xr_ntrb);
139 139 xrp->xr_head = 0;
140 140 xrp->xr_tail = 0;
141 141 xrp->xr_cycle = 1;
142 142
143 143 /*
144 144 * Set up the link TRB back to ourselves.
145 145 */
146 146 ltrb = &xrp->xr_trb[xrp->xr_ntrb - 1];
147 147 ltrb->trb_addr = LE_64(xhci_dma_pa(&xrp->xr_dma));
148 148 ltrb->trb_flags = LE_32(XHCI_TRB_TYPE_LINK | XHCI_TRB_LINKSEG);
149 149
150 150 XHCI_DMA_SYNC(xrp->xr_dma, DDI_DMA_SYNC_FORDEV);
151 151 if (xhci_check_dma_handle(xhcip, &xrp->xr_dma) != DDI_FM_OK) {
152 152 ddi_fm_service_impact(xhcip->xhci_dip, DDI_SERVICE_LOST);
153 153 return (EIO);
154 154 }
155 155
156 156 return (0);
157 157 }
158 158
159 159 int
160 160 xhci_ring_alloc(xhci_t *xhcip, xhci_ring_t *xrp)
161 161 {
162 162 ddi_dma_attr_t attr;
163 163 ddi_device_acc_attr_t acc;
164 164
165 165 /*
166 166 * We use a transfer attribute for the rings as they require 64-byte
167 167 * boundaries.
168 168 */
169 169 xhci_dma_acc_attr(xhcip, &acc);
170 170 xhci_dma_transfer_attr(xhcip, &attr, XHCI_DEF_DMA_SGL);
171 171 bzero(xrp, sizeof (xhci_ring_t));
172 172 if (xhci_dma_alloc(xhcip, &xrp->xr_dma, &attr, &acc, B_FALSE,
173 173 xhcip->xhci_caps.xcap_pagesize, B_FALSE) == B_FALSE)
174 174 return (ENOMEM);
175 175 xrp->xr_trb = (xhci_trb_t *)xrp->xr_dma.xdb_va;
176 176 xrp->xr_ntrb = xhcip->xhci_caps.xcap_pagesize / sizeof (xhci_trb_t);
177 177 return (0);
178 178 }
179 179
180 180 /*
181 181 * Note, caller should have already synced our DMA memory. This should not be
182 182 * used for the command ring, as its cycle is maintained by the cycling of the
183 183 * head. This function is only used for managing the event ring.
184 184 */
185 185 xhci_trb_t *
186 186 xhci_ring_event_advance(xhci_ring_t *xrp)
187 187 {
188 188 xhci_trb_t *trb = &xrp->xr_trb[xrp->xr_tail];
189 189 VERIFY(xrp->xr_tail < xrp->xr_ntrb);
190 190
191 191 if (xrp->xr_cycle != (LE_32(trb->trb_flags) & XHCI_TRB_CYCLE))
192 192 return (NULL);
193 193
194 194 /*
195 195 * The event ring does not use a link TRB. It instead always uses
196 196 * information based on the table to wrap. That means that the last
197 197 * entry is in fact going to contain data, so we shouldn't wrap and
198 198 * toggle the cycle until after we've processed that, in other words the
199 199 * tail equals the total number of entries.
200 200 */
201 201 xrp->xr_tail++;
202 202 if (xrp->xr_tail == xrp->xr_ntrb) {
203 203 xrp->xr_cycle ^= 1;
204 204 xrp->xr_tail = 0;
205 205 }
206 206
207 207 return (trb);
208 208 }
209 209
210 210 /*
211 211 * When processing the command ring, we're going to get a single event for each
212 212 * entry in it. As we've submitted things in order, we need to make sure that
213 213 * this address matches the DMA address that we'd expect of the current tail.
214 214 */
215 215 boolean_t
216 216 xhci_ring_trb_tail_valid(xhci_ring_t *xrp, uint64_t dma)
217 217 {
218 218 uint64_t tail;
219 219
220 220 tail = xhci_dma_pa(&xrp->xr_dma) + xrp->xr_tail * sizeof (xhci_trb_t);
221 221 return (dma == tail);
222 222 }
223 223
224 224 /*
225 225 * A variant on the above that checks for a given message within a range of
226 226 * entries and returns the offset to it from the tail.
227 227 */
228 228 int
229 229 xhci_ring_trb_valid_range(xhci_ring_t *xrp, uint64_t dma, uint_t range)
230 230 {
231 231 uint_t i;
232 232 uint_t tail = xrp->xr_tail;
233 233 uint64_t taddr;
234 234
235 235 VERIFY(range < xrp->xr_ntrb);
236 236 for (i = 0; i < range; i++) {
237 237 taddr = xhci_dma_pa(&xrp->xr_dma) + tail * sizeof (xhci_trb_t);
238 238 if (taddr == dma)
239 239 return (i);
240 240
241 241 tail++;
242 242 if (tail == xrp->xr_ntrb - 1)
243 243 tail = 0;
244 244 }
245 245
246 246 return (-1);
247 247 }
248 248
249 249 /*
250 250 * Determine whether or not we have enough space for this request in a given
251 251 * ring for the given request. Note, we have to be a bit careful here and ensure
252 252 * that we properly handle cases where we cross the link TRB and that we don't
253 253 * count it.
254 254 *
255 255 * To determine if we have enough space for a given number of trbs, we need to
256 256 * logically advance the head pointer and make sure that we don't cross the tail
257 257 * pointer. In other words, if after advancement, head == tail, we're in
258 258 * trouble and don't have enough space.
259 259 */
260 260 boolean_t
261 261 xhci_ring_trb_space(xhci_ring_t *xrp, uint_t ntrb)
262 262 {
263 263 uint_t i;
264 264 uint_t head = xrp->xr_head;
265 265
266 266 VERIFY(ntrb > 0);
267 267 /* We use < to ignore the link TRB */
268 268 VERIFY(ntrb < xrp->xr_ntrb);
269 269
270 270 for (i = 0; i < ntrb; i++) {
271 271 head++;
272 272 if (head == xrp->xr_ntrb - 1) {
273 273 head = 0;
274 274 }
275 275
276 276 if (head == xrp->xr_tail)
277 277 return (B_FALSE);
278 278 }
279 279
280 280 return (B_TRUE);
|
↓ open down ↓ |
257 lines elided |
↑ open up ↑ |
281 281 }
282 282
283 283 /*
284 284 * Fill in a TRB in the ring at offset trboff. If cycle is currently set to
285 285 * B_TRUE, then we fill in the appropriate cycle bit to tell the system to
286 286 * advance, otherwise we leave the existing cycle bit untouched so the system
287 287 * doesn't accidentally advance until we have everything filled in.
288 288 */
289 289 void
290 290 xhci_ring_trb_fill(xhci_ring_t *xrp, uint_t trboff, xhci_trb_t *host_trb,
291 - boolean_t put_cycle)
291 + uint64_t *trb_pap, boolean_t put_cycle)
292 292 {
293 293 uint_t i;
294 294 uint32_t flags;
295 295 uint_t ent = xrp->xr_head;
296 296 uint8_t cycle = xrp->xr_cycle;
297 297 xhci_trb_t *trb;
298 298
299 299 for (i = 0; i < trboff; i++) {
300 300 ent++;
301 301 if (ent == xrp->xr_ntrb - 1) {
302 302 ent = 0;
303 303 cycle ^= 1;
304 304 }
305 305 }
306 306
307 307 /*
308 308 * If we're being asked to not update the cycle for it to be valid to be
309 309 * produced, we need to xor this once again to get to the inappropriate
310 310 * value.
311 311 */
312 312 if (put_cycle == B_FALSE)
313 313 cycle ^= 1;
314 314
315 315 trb = &xrp->xr_trb[ent];
316 316
|
↓ open down ↓ |
15 lines elided |
↑ open up ↑ |
317 317 trb->trb_addr = host_trb->trb_addr;
318 318 trb->trb_status = host_trb->trb_status;
319 319 flags = host_trb->trb_flags;
320 320 if (cycle == 0) {
321 321 flags &= ~LE_32(XHCI_TRB_CYCLE);
322 322 } else {
323 323 flags |= LE_32(XHCI_TRB_CYCLE);
324 324 }
325 325
326 326 trb->trb_flags = flags;
327 +
328 + if (trb_pap != NULL) {
329 + uint64_t pa;
330 +
331 + /*
332 + * This logic only works if we have a single cookie address.
333 + * However, this is prettty tightly assumed for rings through
334 + * the xhci driver at this time.
335 + */
336 + ASSERT3U(xrp->xr_dma.xdb_ncookies, ==, 1);
337 + pa = xrp->xr_dma.xdb_cookies[0].dmac_laddress;
338 + pa += ((uintptr_t)trb - (uintptr_t)&xrp->xr_trb[0]);
339 + *trb_pap = pa;
340 + }
327 341 }
328 342
329 343 /*
330 344 * Update our metadata for the ring and verify the cycle bit is correctly set
331 345 * for the first trb. It is expected that it is incorrectly set.
332 346 */
333 347 void
334 348 xhci_ring_trb_produce(xhci_ring_t *xrp, uint_t ntrb)
335 349 {
336 350 uint_t i, ohead;
337 351 xhci_trb_t *trb;
338 352
339 353 VERIFY(ntrb > 0);
340 354
341 355 ohead = xrp->xr_head;
342 356
343 357 /*
344 358 * As part of updating the head, we need to make sure we correctly
345 359 * update the cycle bit of the link TRB. So we always do this first
346 360 * before we update the old head, to try and get a consistent view of
347 361 * the cycle bit.
348 362 */
349 363 for (i = 0; i < ntrb; i++) {
350 364 xrp->xr_head++;
351 365 /*
352 366 * If we're updating the link TRB, we also need to make sure
353 367 * that the Chain bit is set if we're in the middle of a TD
354 368 * comprised of multiple TRDs. Thankfully the algorithmn here is
355 369 * simple: set it to the value of the previous TRB.
356 370 */
357 371 if (xrp->xr_head == xrp->xr_ntrb - 1) {
358 372 trb = &xrp->xr_trb[xrp->xr_ntrb - 1];
359 373 if (xrp->xr_trb[xrp->xr_ntrb - 2].trb_flags &
360 374 XHCI_TRB_CHAIN) {
361 375 trb->trb_flags |= XHCI_TRB_CHAIN;
362 376 } else {
363 377 trb->trb_flags &= ~XHCI_TRB_CHAIN;
364 378
365 379 }
366 380 trb->trb_flags ^= LE_32(XHCI_TRB_CYCLE);
367 381 xrp->xr_cycle ^= 1;
368 382 xrp->xr_head = 0;
369 383 }
370 384 }
371 385
372 386 trb = &xrp->xr_trb[ohead];
|
↓ open down ↓ |
36 lines elided |
↑ open up ↑ |
373 387 trb->trb_flags ^= LE_32(XHCI_TRB_CYCLE);
374 388 }
375 389
376 390 /*
377 391 * This is a convenience wrapper for the single TRB case to make callers less
378 392 * likely to mess up some of the required semantics.
379 393 */
380 394 void
381 395 xhci_ring_trb_put(xhci_ring_t *xrp, xhci_trb_t *trb)
382 396 {
383 - xhci_ring_trb_fill(xrp, 0U, trb, B_FALSE);
397 + xhci_ring_trb_fill(xrp, 0U, trb, NULL, B_FALSE);
384 398 xhci_ring_trb_produce(xrp, 1U);
385 399 }
386 400
387 401 /*
388 402 * Update the tail pointer for a ring based on the DMA address of a consumed
389 403 * entry. Note, this entry indicates what we just processed, therefore we should
390 404 * bump the tail entry to the next one.
391 405 */
392 406 boolean_t
393 407 xhci_ring_trb_consumed(xhci_ring_t *xrp, uint64_t dma)
394 408 {
395 409 uint64_t pa = xhci_dma_pa(&xrp->xr_dma);
396 410 uint64_t high = pa + xrp->xr_ntrb * sizeof (xhci_trb_t);
397 411
398 412 if (dma < pa || dma >= high ||
399 413 dma % sizeof (xhci_trb_t) != 0)
400 414 return (B_FALSE);
401 415
402 416 dma -= pa;
403 417 dma /= sizeof (xhci_trb_t);
404 418
405 419 VERIFY(dma < xrp->xr_ntrb);
406 420
407 421 xrp->xr_tail = dma + 1;
408 422 if (xrp->xr_tail == xrp->xr_ntrb - 1)
409 423 xrp->xr_tail = 0;
410 424
411 425 return (B_TRUE);
412 426 }
413 427
414 428 /*
415 429 * The ring represented here has been reset and we're being asked to basically
416 430 * skip all outstanding entries. Note, this shouldn't be used for the event
417 431 * ring. Because the cycle bit is toggled whenever the head moves past the link
418 432 * trb, the cycle bit is already correct. So in this case, it's really just a
419 433 * matter of setting the current tail equal to the head, at which point we
420 434 * consider things empty.
421 435 */
422 436 void
423 437 xhci_ring_skip(xhci_ring_t *xrp)
424 438 {
425 439 xrp->xr_tail = xrp->xr_head;
426 440 }
427 441
428 442 /*
429 443 * A variant on the normal skip. This basically just tells us to make sure that
430 444 * that everything this transfer represents has been skipped. Callers need to
431 445 * make sure that this is actually the first transfer in the ring. Like above,
432 446 * we don't need to touch the cycle bit.
433 447 */
434 448 void
435 449 xhci_ring_skip_transfer(xhci_ring_t *xrp, xhci_transfer_t *xt)
436 450 {
437 451 uint_t i;
438 452
439 453 for (i = 0; i < xt->xt_ntrbs; i++) {
440 454 xrp->xr_tail++;
441 455 if (xrp->xr_tail == xrp->xr_ntrb - 1)
442 456 xrp->xr_tail = 0;
443 457 }
444 458 }
|
↓ open down ↓ |
51 lines elided |
↑ open up ↑ |
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX