Print this page
NEX-20178 Heavy read load using 10G i40e causes network disconnect
MFV illumos-joyent@83a8d0d616db36010b59cc850d1926c0f6a30de1
OS-7457 i40e Tx freezes on zero descriptors
Reviewed by: Robert Mustacchi <rm@joyent.com>
Reviewed by: Rob Johnston <rob.johnston@joyent.com>
Approved by: Robert Mustacchi <rm@joyent.com>
MFV illumos-joyent@0d3f2b61dcfb18edace4fd257054f6fdbe07c99c
OS-7492 i40e Tx freeze when b_cont chain exceeds 8 descriptors
Reviewed by: Robert Mustacchi <rm@joyent.com>
Reviewed by: Rob Johnston <rob.johnston@joyent.com>
Approved by: Robert Mustacchi <rm@joyent.com>
MFV illumos-joyent@b4bede175d4c50ac1b36078a677b69388f6fb59f
OS-7577 initialize FC for i40e
Reviewed by: Robert Mustacchi <rm@joyent.com>
Approved by: Rob Johnston <rob.johnston@joyent.com>
MFV illumos-joyent@83a8d0d616db36010b59cc850d1926c0f6a30de1
OS-7457 i40e Tx freezes on zero descriptors
Reviewed by: Robert Mustacchi <rm@joyent.com>
Reviewed by: Rob Johnston <rob.johnston@joyent.com>
Approved by: Robert Mustacchi <rm@joyent.com>
NEX-19928 i40e: cannot create static IP address
Reviewed by: Cynthia Eastham <cynthia.eastham@nexenta.com>
MFV: illumos-joyent@b93056a35d6d6d301f24bc6631fc57dd2c8992c4
OS-7456 i40e default VSI sometimes lacks implicit L2 filter
Reviewed by: Robert Mustacchi <rm@joyent.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Ryan Zezeski <rpz@joyent.com>
MFV: illumos-joyent@61dc3dec4f82a3e13e94609a0a83d5f66c64e760
OS-6846 want i40e multi-group support
OS-7372 i40e_alloc_ring_mem() unwinds when it shouldn't
Reviewed by: Robert Mustacchi <rm@joyent.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Ryan Zezeski <rpz@joyent.com>
MFV: illumos-joyent@9e30beee2f0c127bf41868db46257124206e28d6
OS-5225 Want Fortville TSO support
Reviewed by: Ryan Zezeski <rpz@joyent.com>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Approved by: Patrick Mooney <patrick.mooney@joyent.com>
Author: Rob Johnston <rob.johnston@joyent.com>
NEX-7822 40Gb Intel XL710 NIC performance data
Reviewed by: Steve Peng <steve.peng@nexenta.com>
Reviewed by: Evan Layton <evan.layton@nexenta.com>
   1 /*
   2  * This file and its contents are supplied under the terms of the
   3  * Common Development and Distribution License ("CDDL"), version 1.0.
   4  * You may only use this file in accordance with the terms of version
   5  * 1.0 of the CDDL.
   6  *
   7  * A full copy of the text of the CDDL should have accompanied this
   8  * source.  A copy of the CDDL is also available via the Internet at
   9  * http://www.illumos.org/license/CDDL.
  10  */
  11 
  12 /*
  13  * Copyright 2015 OmniTI Computer Consulting, Inc. All rights reserved.
  14  * Copyright (c) 2017, Joyent, Inc.
  15  * Copyright 2017 Tegile Systems, Inc.  All rights reserved.
  16  */
  17 
  18 /*
  19  * i40e - Intel 10/40 Gb Ethernet driver
  20  *
  21  * The i40e driver is the main software device driver for the Intel 40 Gb family
  22  * of devices. Note that these devices come in many flavors with both 40 GbE
  23  * ports and 10 GbE ports. This device is the successor to the 82599 family of
  24  * devices (ixgbe).
  25  *
  26  * Unlike previous generations of Intel 1 GbE and 10 GbE devices, the 40 GbE
  27  * devices defined in the XL710 controller (previously known as Fortville) are a
  28  * rather different beast and have a small switch embedded inside of them. In
  29  * addition, the way that most of the programming is done has been overhauled.
  30  * As opposed to just using PCIe memory mapped registers, it also has an
  31  * administrative queue which is used to communicate with firmware running on
  32  * the chip.
  33  *
  34  * Each physical function in the hardware shows up as a device that this driver


 120  * determining which functions belong to the same device. Nominally one might do
 121  * this by having a nexus driver; however, a prime requirement for a nexus
 122  * driver is identifying the various children and activating them. While it is
 123  * possible to get this information from NVRAM, we would end up duplicating a
 124  * lot of the PCI enumeration logic. Really, at the end of the day, the device
 125  * doesn't give us the traditional identification properties we want from a
 126  * nexus driver.
 127  *
 128  * Instead, we rely on some properties that are guaranteed to be unique. While
 129  * it might be tempting to leverage the PBA or serial number of the device from
 130  * NVRAM, there is nothing that says that two devices can't be mis-programmed to
 131  * have the same values in NVRAM. Instead, we uniquely identify a group of
 132  * functions based on their parent in the /devices tree, their PCI bus and PCI
 133  * function identifiers. Using either on their own may not be sufficient.
 134  *
 135  * For each unique PCI device that we encounter, we'll create a i40e_device_t.
 136  * From there, because we don't have a good way to tell the GLDv3 about sharing
 137  * resources between everything, we'll end up just dividing the resources
 138  * evenly between all of the functions. Longer term, if we don't have to declare
 139  * to the GLDv3 that these resources are shared, then we'll maintain a pool and
 140  * hae each PF allocate from the pool in the device, thus if only two of four
 141  * ports are being used, for example, then all of the resources can still be
 142  * used.
 143  *
 144  * -------------------------------------------
 145  * Transmit and Receive Queue Pair Allocations
 146  * -------------------------------------------
 147  *
 148  * NVRAM ends up assigning each PF its own share of the transmit and receive LAN
 149  * queue pairs, we have no way of modifying it, only observing it. From there,
 150  * it's up to us to map these queues to VSIs and VFs. Since we don't support any
 151  * VFs at this time, we only focus on assignments to VSIs.
 152  *
 153  * At the moment, we used a static mapping of transmit/receive queue pairs to a
 154  * given VSI (eg. rings to a group). Though in the fullness of time, we want to
 155  * make this something which is fully dynamic and take advantage of documented,
 156  * but not yet available functionality for adding filters based on VXLAN and
 157  * other encapsulation technologies.
 158  *
 159  * -------------------------------------
 160  * Broadcast, Multicast, and Promiscuous
 161  * -------------------------------------
 162  *
 163  * As part of the GLDv3, we need to make sure that we can handle receiving
 164  * broadcast and multicast traffic. As well as enabling promiscuous mode when
 165  * requested. GLDv3 requires that all broadcast and multicast traffic be
 166  * retrieved by the default group, eg. the first one. This is the same thing as
 167  * the default VSI.
 168  *
 169  * To receieve broadcast traffic, we enable it through the admin queue, rather
 170  * than use one of our filters for it. For multicast traffic, we reserve a
 171  * certain number of the hash filters and assign them to a given PF. When we
 172  * exceed those, we then switch to using promicuous mode for multicast traffic.
 173  *
 174  * More specifically, once we exceed the number of filters (indicated because
 175  * the i40e_t`i40e_resources.ifr_nmcastfilt ==
 176  * i40e_t`i40e_resources.ifr_nmcastfilt_used), we then instead need to toggle
 177  * promiscuous mode. If promiscuous mode is toggled then we keep track of the
 178  * number of MACs added to it by incrementing i40e_t`i40e_mcast_promisc_count.
 179  * That will stay enabled until that count reaches zero indicating that we have
 180  * only added multicast addresses that we have a corresponding entry for.
 181  *
 182  * Because MAC itself wants to toggle promiscuous mode, which includes both
 183  * unicast and multicast traffic, we go through and keep track of that
 184  * ourselves. That is maintained through the use of the i40e_t`i40e_promisc_on
 185  * member.
 186  *
 187  * --------------
 188  * VSI Management
 189  * --------------
 190  *
 191  * At this time, we currently only support a single MAC group, and thus a single
 192  * VSI. This VSI is considered the default VSI and should be the only one that
 193  * exists after a reset. Currently it is stored as the member
 194  * i40e_t`i40e_vsi_id. While this works for the moment and for an initial
 195  * driver, it's not sufficient for the longer-term path of the driver. Instead,
 196  * we'll want to actually have a unique i40e_vsi_t structure which is used
 197  * everywhere. Note that this means that every place that uses the
 198  * i40e_t`i40e_vsi_id will need to be refactored.

 199  *
 200  * ----------------
 201  * Structure Layout
 202  * ----------------
 203  *
 204  * The following images relates the core data structures together. The primary
 205  * structure in the system is the i40e_t. It itself contains multiple rings,
 206  * i40e_trqpair_t's which contain the various transmit and receive data. The
 207  * receive data is stored outside of the i40e_trqpair_t and instead in the
 208  * i40e_rx_data_t. The i40e_t has a corresponding i40e_device_t which keeps
 209  * track of per-physical device state. Finally, for every active descriptor,
 210  * there is a corresponding control block, which is where the
 211  * i40e_rx_control_block_t and the i40e_tx_control_block_t come from.
 212  *
 213  *   +-----------------------+       +-----------------------+
 214  *   | Global i40e_t list    |       | Global Device list    |
 215  *   |                       |    +--|                       |
 216  *   | i40e_glist            |    |  | i40e_dlist            |
 217  *   +-----------------------+    |  +-----------------------+
 218  *       |                        v


 223  *       |      | dev_info_t *     ------+--> Parent in devices tree.
 224  *       |      | uint_t           ------+--> PCI bus number
 225  *       |      | uint_t           ------+--> PCI device number
 226  *       |      | uint_t           ------+--> Number of functions
 227  *       |      | i40e_switch_rsrcs_t ---+--> Captured total switch resources
 228  *       |      | list_t           ------+-------------+
 229  *       |      +------------------------+             |
 230  *       |                           ^                 |
 231  *       |                           +--------+        |
 232  *       |                                    |        v
 233  *       |  +---------------------------+     |   +-------------------+
 234  *       +->| GLDv3 Device, per PF      |-----|-->| GLDv3 Device (PF) |--> ...
 235  *          | i40e_t                    |     |   | i40e_t            |
 236  *          | **Primary Structure**     |     |   +-------------------+
 237  *          |                           |     |
 238  *          | i40e_device_t *         --+-----+
 239  *          | i40e_state_t            --+---> Device State
 240  *          | i40e_hw_t               --+---> Intel common code structure
 241  *          | mac_handle_t            --+---> GLDv3 handle to MAC
 242  *          | ddi_periodic_t          --+---> Link activity timer
 243  *          | int (vsi_id)            --+---> VSI ID, main identifier
 244  *          | i40e_func_rsrc_t        --+---> Available hardware resources
 245  *          | i40e_switch_rsrc_t *    --+---> Switch resource snapshot
 246  *          | i40e_sdu                --+---> Current MTU
 247  *          | i40e_frame_max          --+---> Current HW frame size
 248  *          | i40e_uaddr_t *          --+---> Array of assigned unicast MACs
 249  *          | i40e_maddr_t *          --+---> Array of assigned multicast MACs
 250  *          | i40e_mcast_promisccount --+---> Active multicast state
 251  *          | i40e_promisc_on         --+---> Current promiscuous mode state
 252  *          | int                     --+---> Number of transmit/receive pairs

 253  *          | kstat_t *               --+---> PF kstats
 254  *          | kstat_t *               --+---> VSI kstats
 255  *          | i40e_pf_stats_t         --+---> PF kstat backing data
 256  *          | i40e_vsi_stats_t        --+---> VSI kstat backing data
 257  *          | i40e_trqpair_t *        --+---------+
 258  *          +---------------------------+         |
 259  *                                                |
 260  *                                                v
 261  *  +-------------------------------+       +-----------------------------+
 262  *  | Transmit/Receive Queue Pair   |-------| Transmit/Receive Queue Pair |->...
 263  *  | i40e_trqpair_t                |       | i40e_trqpair_t              |
 264  *  + Ring Data Structure           |       +-----------------------------+
 265  *  |                               |
 266  *  | mac_ring_handle_t             +--> MAC RX ring handle
 267  *  | mac_ring_handle_t             +--> MAC TX ring handle
 268  *  | i40e_rxq_stat_t             --+--> RX Queue stats
 269  *  | i40e_txq_stat_t             --+--> TX Queue stats
 270  *  | uint32_t (tx ring size)       +--> TX Ring Size
 271  *  | uint32_t (tx free list size)  +--> TX Free List Size
 272  *  | i40e_dma_buffer_t     --------+--> TX Descriptor ring DMA
 273  *  | i40e_tx_desc_t *      --------+--> TX descriptor ring
 274  *  | volatile unt32_t *            +--> TX Write back head
 275  *  | uint32_t               -------+--> TX ring head
 276  *  | uint32_t               -------+--> TX ring tail


 342  *
 343  * 2) When grabbing locks between multiple transmit and receive queues, the
 344  * locks for the lowest number transmit/receive queue should be grabbed first.
 345  *
 346  * 3) When grabbing both the transmit and receive lock for a given queue, always
 347  * grab i40e_trqpair_t`itrq_rx_lock before the i40e_trqpair_t`itrq_tx_lock.
 348  *
 349  * 4) The following pairs of locks are not expected to be held at the same time:
 350  *
 351  * o i40e_t`i40e_rx_pending_lock and i40e_trqpair_t`itrq_tcb_lock
 352  *
 353  * -----------
 354  * Future Work
 355  * -----------
 356  *
 357  * At the moment the i40e_t driver is rather bare bones, allowing us to start
 358  * getting data flowing and folks using it while we develop additional features.
 359  * While bugs have been filed to cover this future work, the following gives an
 360  * overview of expected work:
 361  *
 362  *  o TSO support
 363  *  o Multiple group support
 364  *  o DMA binding and breaking up the locking in ring recycling.
 365  *  o Enhanced detection of device errors
 366  *  o Participation in IRM
 367  *  o FMA device reset
 368  *  o Stall detection, temperature error detection, etc.
 369  *  o More dynamic resource pools
 370  */
 371 
 372 #include "i40e_sw.h"
 373 
 374 static char i40e_ident[] = "Intel 10/40Gb Ethernet v1.0.1";
 375 
 376 /*
 377  * The i40e_glock primarily protects the lists below and the i40e_device_t
 378  * structures.
 379  */
 380 static kmutex_t i40e_glock;
 381 static list_t i40e_glist;
 382 static list_t i40e_dlist;
 383 
 384 /*
 385  * Access attributes for register mapping.
 386  */
 387 static ddi_device_acc_attr_t i40e_regs_acc_attr = {
 388         DDI_DEVICE_ATTR_V1,
 389         DDI_STRUCTURE_LE_ACC,
 390         DDI_STRICTORDER_ACC,
 391         DDI_FLAGERR_ACC
 392 };
 393 
 394 /*


 744 
 745                 ddi_fm_fini(i40e->i40e_dip);
 746         }
 747 }
 748 
 749 void
 750 i40e_fm_ereport(i40e_t *i40e, char *detail)
 751 {
 752         uint64_t ena;
 753         char buf[FM_MAX_CLASS];
 754 
 755         (void) snprintf(buf, FM_MAX_CLASS, "%s.%s", DDI_FM_DEVICE, detail);
 756         ena = fm_ena_generate(0, FM_ENA_FMT1);
 757         if (DDI_FM_EREPORT_CAP(i40e->i40e_fm_capabilities)) {
 758                 ddi_fm_ereport_post(i40e->i40e_dip, buf, ena, DDI_NOSLEEP,
 759                     FM_VERSION, DATA_TYPE_UINT8, FM_EREPORT_VERS0, NULL);
 760         }
 761 }
 762 
 763 /*
 764  * Here we're trying to get the ID of the default VSI. In general, when we come
 765  * through and look at this shortly after attach, we expect there to only be a
 766  * single element present, which is the default VSI. Importantly, each PF seems
 767  * to not see any other devices, in part because of the simple switch mode that
 768  * we're using. If for some reason, we see more artifact, we'll need to revisit
 769  * what we're doing here.

 770  */
 771 static int
 772 i40e_get_vsi_id(i40e_t *i40e)
 773 {
 774         i40e_hw_t *hw = &i40e->i40e_hw_space;
 775         struct i40e_aqc_get_switch_config_resp *sw_config;
 776         uint8_t aq_buf[I40E_AQ_LARGE_BUF];
 777         uint16_t next = 0;
 778         int rc;
 779 
 780         /* LINTED: E_BAD_PTR_CAST_ALIGN */
 781         sw_config = (struct i40e_aqc_get_switch_config_resp *)aq_buf;
 782         rc = i40e_aq_get_switch_config(hw, sw_config, sizeof (aq_buf), &next,
 783             NULL);
 784         if (rc != I40E_SUCCESS) {
 785                 i40e_error(i40e, "i40e_aq_get_switch_config() failed %d: %d",
 786                     rc, hw->aq.asq_last_status);
 787                 return (-1);
 788         }
 789 
 790         if (LE_16(sw_config->header.num_reported) != 1) {
 791                 i40e_error(i40e, "encountered multiple (%d) switching units "
 792                     "during attach, not proceeding",
 793                     LE_16(sw_config->header.num_reported));


























 794                 return (-1);
 795         }
 796 
 797         return (sw_config->element[0].seid);
 798 }
 799 
 800 /*
 801  * We need to fill the i40e_hw_t structure with the capabilities of this PF. We
 802  * must also provide the memory for it; however, we don't need to keep it around
 803  * to the call to the common code. It takes it and parses it into an internal
 804  * structure.
 805  */
 806 static boolean_t
 807 i40e_get_hw_capabilities(i40e_t *i40e, i40e_hw_t *hw)
 808 {
 809         struct i40e_aqc_list_capabilities_element_resp *buf;
 810         int rc;
 811         size_t len;
 812         uint16_t needed;
 813         int nelems = I40E_HW_CAP_DEFAULT;
 814 
 815         len = nelems * sizeof (*buf);
 816 
 817         for (;;) {


1081                 for (i = 0; i < i40e->i40e_intr_count; i++) {
1082                         rc = ddi_intr_disable(i40e->i40e_intr_handles[i]);
1083                         if (rc != DDI_SUCCESS) {
1084                                 i40e_error(i40e,
1085                                     "Failed to disable interrupt %d: %d",
1086                                     i, rc);
1087                                 return (B_FALSE);
1088                         }
1089                 }
1090         }
1091 
1092         return (B_TRUE);
1093 }
1094 
1095 /*
1096  * Free receive & transmit rings.
1097  */
1098 static void
1099 i40e_free_trqpairs(i40e_t *i40e)
1100 {
1101         int i;
1102         i40e_trqpair_t *itrq;
1103 






1104         if (i40e->i40e_trqpairs != NULL) {
1105                 for (i = 0; i < i40e->i40e_num_trqpairs; i++) {
1106                         itrq = &i40e->i40e_trqpairs[i];
1107                         mutex_destroy(&itrq->itrq_rx_lock);
1108                         mutex_destroy(&itrq->itrq_tx_lock);
1109                         mutex_destroy(&itrq->itrq_tcb_lock);
1110 
1111                         /*
1112                          * Should have already been cleaned up by start/stop,
1113                          * etc.
1114                          */
1115                         ASSERT(itrq->itrq_txkstat == NULL);
1116                         ASSERT(itrq->itrq_rxkstat == NULL);
1117                 }
1118 
1119                 kmem_free(i40e->i40e_trqpairs,
1120                     sizeof (i40e_trqpair_t) * i40e->i40e_num_trqpairs);
1121                 i40e->i40e_trqpairs = NULL;
1122         }
1123 
1124         cv_destroy(&i40e->i40e_rx_pending_cv);
1125         mutex_destroy(&i40e->i40e_rx_pending_lock);
1126         mutex_destroy(&i40e->i40e_general_lock);
1127 }
1128 
1129 /*
1130  * Allocate transmit and receive rings, as well as other data structures that we
1131  * need.
1132  */
1133 static boolean_t
1134 i40e_alloc_trqpairs(i40e_t *i40e)
1135 {
1136         int i;
1137         void *mutexpri = DDI_INTR_PRI(i40e->i40e_intr_pri);
1138 
1139         /*
1140          * Now that we have the priority for the interrupts, initialize
1141          * all relevant locks.
1142          */
1143         mutex_init(&i40e->i40e_general_lock, NULL, MUTEX_DRIVER, mutexpri);
1144         mutex_init(&i40e->i40e_rx_pending_lock, NULL, MUTEX_DRIVER, mutexpri);
1145         cv_init(&i40e->i40e_rx_pending_cv, NULL, CV_DRIVER, NULL);
1146 
1147         i40e->i40e_trqpairs = kmem_zalloc(sizeof (i40e_trqpair_t) *
1148             i40e->i40e_num_trqpairs, KM_SLEEP);
1149         for (i = 0; i < i40e->i40e_num_trqpairs; i++) {
1150                 i40e_trqpair_t *itrq = &i40e->i40e_trqpairs[i];
1151 
1152                 itrq->itrq_i40e = i40e;
1153                 mutex_init(&itrq->itrq_rx_lock, NULL, MUTEX_DRIVER, mutexpri);
1154                 mutex_init(&itrq->itrq_tx_lock, NULL, MUTEX_DRIVER, mutexpri);
1155                 mutex_init(&itrq->itrq_tcb_lock, NULL, MUTEX_DRIVER, mutexpri);
1156                 itrq->itrq_index = i;
1157         }
1158 










1159         return (B_TRUE);
1160 }
1161 
1162 
1163 
1164 /*
1165  * Unless a .conf file already overrode i40e_t structure values, they will
1166  * be 0, and need to be set in conjunction with the now-available HW report.
1167  *
1168  * However, at the moment, we cap all of these resources as we only support a
1169  * single receive ring and a single group.
1170  */
1171 /* ARGSUSED */
1172 static void
1173 i40e_hw_to_instance(i40e_t *i40e, i40e_hw_t *hw)
1174 {
1175         if (i40e->i40e_num_trqpairs == 0) {
1176                 i40e->i40e_num_trqpairs = I40E_TRQPAIR_MAX;





1177         }

1178 
1179         if (i40e->i40e_num_rx_groups == 0) {
1180                 i40e->i40e_num_rx_groups = I40E_GROUP_MAX;
1181         }
1182 }
1183 
1184 /*
1185  * Free any resources required by, or setup by, the Intel common code.
1186  */
1187 static void
1188 i40e_common_code_fini(i40e_t *i40e)
1189 {
1190         i40e_hw_t *hw = &i40e->i40e_hw_space;
1191         int rc;
1192 
1193         rc = i40e_shutdown_lan_hmc(hw);
1194         if (rc != I40E_SUCCESS)
1195                 i40e_error(i40e, "failed to shutdown LAN hmc: %d", rc);
1196 
1197         rc = i40e_shutdown_adminq(hw);


1292                 i40e_error(i40e, "failed to retrieve hardware mac address: %d",
1293                     rc);
1294                 return (B_FALSE);
1295         }
1296 
1297         rc = i40e_validate_mac_addr(hw->mac.addr);
1298         if (rc != 0) {
1299                 i40e_error(i40e, "failed to validate internal mac address: "
1300                     "%d", rc);
1301                 return (B_FALSE);
1302         }
1303         bcopy(hw->mac.addr, hw->mac.perm_addr, ETHERADDRL);
1304         if ((rc = i40e_get_port_mac_addr(hw, hw->mac.port_addr)) !=
1305             I40E_SUCCESS) {
1306                 i40e_error(i40e, "failed to retrieve port mac address: %d",
1307                     rc);
1308                 return (B_FALSE);
1309         }
1310 
1311         /*
1312          * We need to obtain the Virtual Station ID (VSI) before we can
1313          * perform other operations on the device.
1314          */
1315         i40e->i40e_vsi_id = i40e_get_vsi_id(i40e);
1316         if (i40e->i40e_vsi_id == -1) {
1317                 i40e_error(i40e, "failed to obtain VSI ID");
1318                 return (B_FALSE);
1319         }
1320 
1321         return (B_TRUE);
1322 }
1323 
1324 static void
1325 i40e_unconfigure(dev_info_t *devinfo, i40e_t *i40e)
1326 {
1327         int rc;
1328 
1329         if (i40e->i40e_attach_progress & I40E_ATTACH_ENABLE_INTR)
1330                 (void) i40e_disable_interrupts(i40e);
1331 
1332         if ((i40e->i40e_attach_progress & I40E_ATTACH_LINK_TIMER) &&
1333             i40e->i40e_periodic_id != 0) {
1334                 ddi_periodic_delete(i40e->i40e_periodic_id);
1335                 i40e->i40e_periodic_id = 0;
1336         }
1337 


1542         i40e->i40e_tx_block_thresh = i40e_get_prop(i40e, "tx_resched_threshold",
1543             I40E_MIN_TX_BLOCK_THRESH,
1544             i40e->i40e_tx_ring_size - I40E_TX_MAX_COOKIE,
1545             I40E_DEF_TX_BLOCK_THRESH);
1546 
1547         i40e->i40e_rx_ring_size = i40e_get_prop(i40e, "rx_ring_size",
1548             I40E_MIN_RX_RING_SIZE, I40E_MAX_RX_RING_SIZE,
1549             I40E_DEF_RX_RING_SIZE);
1550         if ((i40e->i40e_rx_ring_size % I40E_DESC_ALIGN) != 0) {
1551                 i40e->i40e_rx_ring_size = P2ROUNDUP(i40e->i40e_rx_ring_size,
1552                     I40E_DESC_ALIGN);
1553         }
1554 
1555         i40e->i40e_rx_limit_per_intr = i40e_get_prop(i40e, "rx_limit_per_intr",
1556             I40E_MIN_RX_LIMIT_PER_INTR, I40E_MAX_RX_LIMIT_PER_INTR,
1557             I40E_DEF_RX_LIMIT_PER_INTR);
1558 
1559         i40e->i40e_tx_hcksum_enable = i40e_get_prop(i40e, "tx_hcksum_enable",
1560             B_FALSE, B_TRUE, B_TRUE);
1561 



1562         i40e->i40e_rx_hcksum_enable = i40e_get_prop(i40e, "rx_hcksum_enable",
1563             B_FALSE, B_TRUE, B_TRUE);
1564 
1565         i40e->i40e_rx_dma_min = i40e_get_prop(i40e, "rx_dma_threshold",
1566             I40E_MIN_RX_DMA_THRESH, I40E_MAX_RX_DMA_THRESH,
1567             I40E_DEF_RX_DMA_THRESH);
1568 
1569         i40e->i40e_tx_dma_min = i40e_get_prop(i40e, "tx_dma_threshold",
1570             I40E_MIN_TX_DMA_THRESH, I40E_MAX_TX_DMA_THRESH,
1571             I40E_DEF_TX_DMA_THRESH);
1572 
1573         i40e->i40e_tx_itr = i40e_get_prop(i40e, "tx_intr_throttle",
1574             I40E_MIN_ITR, I40E_MAX_ITR, I40E_DEF_TX_ITR);
1575 
1576         i40e->i40e_rx_itr = i40e_get_prop(i40e, "rx_intr_throttle",
1577             I40E_MIN_ITR, I40E_MAX_ITR, I40E_DEF_RX_ITR);
1578 
1579         i40e->i40e_other_itr = i40e_get_prop(i40e, "other_intr_throttle",
1580             I40E_MIN_ITR, I40E_MAX_ITR, I40E_DEF_OTHER_ITR);
1581 


1711 static boolean_t
1712 i40e_alloc_intrs(i40e_t *i40e, dev_info_t *devinfo)
1713 {
1714         int intr_types, rc;
1715         uint_t max_trqpairs;
1716 
1717         if (i40e_is_x722(i40e)) {
1718                 max_trqpairs = I40E_722_MAX_TC_QUEUES;
1719         } else {
1720                 max_trqpairs = I40E_710_MAX_TC_QUEUES;
1721         }
1722 
1723         rc = ddi_intr_get_supported_types(devinfo, &intr_types);
1724         if (rc != DDI_SUCCESS) {
1725                 i40e_error(i40e, "failed to get supported interrupt types: %d",
1726                     rc);
1727                 return (B_FALSE);
1728         }
1729 
1730         i40e->i40e_intr_type = 0;

1731 







1732         if ((intr_types & DDI_INTR_TYPE_MSIX) &&
1733             i40e->i40e_intr_force <= I40E_INTR_MSIX) {
1734                 if (i40e_alloc_intr_handles(i40e, devinfo,
1735                     DDI_INTR_TYPE_MSIX)) {
1736                         i40e->i40e_num_trqpairs =
1737                             MIN(i40e->i40e_intr_count - 1, max_trqpairs);


































1738                         return (B_TRUE);
1739                 }
1740         }
1741 
1742         /*
1743          * We only use multiple transmit/receive pairs when MSI-X interrupts are
1744          * available due to the fact that the device basically only supports a
1745          * single MSI interrupt.
1746          */
1747         i40e->i40e_num_trqpairs = I40E_TRQPAIR_NOMSIX;

1748         i40e->i40e_num_rx_groups = I40E_GROUP_NOMSIX;
1749 
1750         if ((intr_types & DDI_INTR_TYPE_MSI) &&
1751             (i40e->i40e_intr_force <= I40E_INTR_MSI)) {
1752                 if (i40e_alloc_intr_handles(i40e, devinfo, DDI_INTR_TYPE_MSI))
1753                         return (B_TRUE);
1754         }
1755 
1756         if (intr_types & DDI_INTR_TYPE_FIXED) {
1757                 if (i40e_alloc_intr_handles(i40e, devinfo, DDI_INTR_TYPE_FIXED))
1758                         return (B_TRUE);
1759         }
1760 
1761         return (B_FALSE);
1762 }
1763 
1764 /*
1765  * Map different interrupts to MSI-X vectors.
1766  */
1767 static boolean_t
1768 i40e_map_intrs_to_vectors(i40e_t *i40e)
1769 {
1770         int i;
1771 
1772         if (i40e->i40e_intr_type != DDI_INTR_TYPE_MSIX) {
1773                 return (B_TRUE);
1774         }
1775 
1776         /*
1777          * Each queue pair is mapped to a single interrupt, so transmit
1778          * and receive interrupts for a given queue share the same vector.
1779          * The number of queue pairs is one less than the number of interrupt
1780          * vectors and is assigned the vector one higher than its index.
1781          * Vector zero is reserved for the admin queue.
1782          */
1783         ASSERT(i40e->i40e_intr_count == i40e->i40e_num_trqpairs + 1);

1784 
1785         for (i = 0; i < i40e->i40e_num_trqpairs; i++) {
1786                 i40e->i40e_trqpairs[i].itrq_rx_intrvec = i + 1;
1787                 i40e->i40e_trqpairs[i].itrq_tx_intrvec = i + 1;
1788         }
1789 
1790         return (B_TRUE);
1791 }
1792 
1793 static boolean_t
1794 i40e_add_intr_handlers(i40e_t *i40e)
1795 {
1796         int rc, vector;
1797 
1798         switch (i40e->i40e_intr_type) {
1799         case DDI_INTR_TYPE_MSIX:
1800                 for (vector = 0; vector < i40e->i40e_intr_count; vector++) {
1801                         rc = ddi_intr_add_handler(
1802                             i40e->i40e_intr_handles[vector],
1803                             (ddi_intr_handler_t *)i40e_intr_msix, i40e,
1804                             (void *)(uintptr_t)vector);
1805                         if (rc != DDI_SUCCESS) {
1806                                 i40e_log(i40e, "Add interrupt handler (MSI-X) "
1807                                     "failed: return %d, vector %d", rc, vector);


1906         rc = i40e_aq_set_phy_int_mask(hw, 0, NULL);
1907         if (rc != I40E_SUCCESS) {
1908                 i40e_error(i40e, "failed to update phy link mask: %d", rc);
1909         }
1910 }
1911 
1912 /*
1913  * Go through and re-initialize any existing filters that we may have set up for
1914  * this device. Note that we would only expect them to exist if hardware had
1915  * already been initialized and we had just reset it. While we're not
1916  * implementing this yet, we're keeping this around for when we add reset
1917  * capabilities, so this isn't forgotten.
1918  */
1919 /* ARGSUSED */
1920 static void
1921 i40e_init_macaddrs(i40e_t *i40e, i40e_hw_t *hw)
1922 {
1923 }
1924 
1925 /*
1926  * Configure the hardware for the Virtual Station Interface (VSI).  Currently
1927  * we only support one, but in the future we could instantiate more than one
1928  * per attach-point.
1929  */
1930 static boolean_t
1931 i40e_config_vsi(i40e_t *i40e, i40e_hw_t *hw)

1932 {
1933         struct i40e_vsi_context context;
1934         int err, tc_queues;
1935 
1936         bzero(&context, sizeof (struct i40e_vsi_context));
1937         context.seid = i40e->i40e_vsi_id;
1938         context.pf_num = hw->pf_id;
1939         err = i40e_aq_get_vsi_params(hw, &context, NULL);
1940         if (err != I40E_SUCCESS) {
1941                 i40e_error(i40e, "get VSI params failed with %d", err);
1942                 return (B_FALSE);
1943         }
1944 
1945         i40e->i40e_vsi_num = context.vsi_number;
1946 
1947         /*
1948          * Set the queue and traffic class bits.  Keep it simple for now.


1949          */
1950         context.info.valid_sections = I40E_AQ_VSI_PROP_QUEUE_MAP_VALID;
1951         context.info.mapping_flags = I40E_AQ_VSI_QUE_MAP_CONTIG;
1952         context.info.queue_mapping[0] = I40E_ASSIGN_ALL_QUEUES;


1953 
1954         /*
1955          * tc_queues determines the size of the traffic class, where the size is
1956          * 2^^tc_queues to a maximum of 64 for the X710 and 128 for the X722.

1957          *
1958          * Some examples:
1959          *      i40e_num_trqpairs == 1 =>  tc_queues = 0, 2^^0 = 1.
1960          *      i40e_num_trqpairs == 7 =>  tc_queues = 3, 2^^3 = 8.
1961          *      i40e_num_trqpairs == 8 =>  tc_queues = 3, 2^^3 = 8.
1962          *      i40e_num_trqpairs == 9 =>  tc_queues = 4, 2^^4 = 16.
1963          *      i40e_num_trqpairs == 17 => tc_queues = 5, 2^^5 = 32.
1964          *      i40e_num_trqpairs == 64 => tc_queues = 6, 2^^6 = 64.
1965          */
1966         tc_queues = ddi_fls(i40e->i40e_num_trqpairs - 1);
1967 
1968         context.info.tc_mapping[0] = ((0 << I40E_AQ_VSI_TC_QUE_OFFSET_SHIFT) &






1969             I40E_AQ_VSI_TC_QUE_OFFSET_MASK) |
1970             ((tc_queues << I40E_AQ_VSI_TC_QUE_NUMBER_SHIFT) &
1971             I40E_AQ_VSI_TC_QUE_NUMBER_MASK);
1972 
1973         context.info.valid_sections |= I40E_AQ_VSI_PROP_VLAN_VALID;
1974         context.info.port_vlan_flags = I40E_AQ_VSI_PVLAN_MODE_ALL |












1975             I40E_AQ_VSI_PVLAN_EMOD_NOTHING;

1976 
1977         context.flags = LE16_TO_CPU(I40E_AQ_VSI_TYPE_PF);








1978 
1979         i40e->i40e_vsi_stat_id = LE16_TO_CPU(context.info.stat_counter_idx);
1980         if (i40e_stat_vsi_init(i40e) == B_FALSE)
1981                 return (B_FALSE);
1982 
1983         err = i40e_aq_update_vsi_params(hw, &context, NULL);
1984         if (err != I40E_SUCCESS) {
1985                 i40e_error(i40e, "Update VSI params failed with %d", err);



































1986                 return (B_FALSE);
1987         }
1988 





1989 



1990         return (B_TRUE);
1991 }
1992 
1993 /*
1994  * Configure the RSS key. For the X710 controller family, this is set on a
1995  * per-PF basis via registers. For the X722, this is done on a per-VSI basis
1996  * through the admin queue.
1997  */
1998 static boolean_t
1999 i40e_config_rss_key(i40e_t *i40e, i40e_hw_t *hw)
2000 {
2001         uint32_t seed[I40E_PFQF_HKEY_MAX_INDEX + 1];



2002 
2003         (void) random_get_pseudo_bytes((uint8_t *)seed, sizeof (seed));







2004 
2005         if (i40e_is_x722(i40e)) {













































































































2006                 struct i40e_aqc_get_set_rss_key_data key;
2007                 const char *u8seed = (char *)seed;
2008                 enum i40e_status_code status;

2009 



2010                 CTASSERT(sizeof (key) >= (sizeof (key.standard_rss_key) +
2011                     sizeof (key.extended_hash_key)));
2012 
2013                 bcopy(u8seed, key.standard_rss_key,
2014                     sizeof (key.standard_rss_key));
2015                 bcopy(&u8seed[sizeof (key.standard_rss_key)],
2016                     key.extended_hash_key, sizeof (key.extended_hash_key));
2017 
2018                 status = i40e_aq_set_rss_key(hw, i40e->i40e_vsi_num, &key);


2019                 if (status != I40E_SUCCESS) {
2020                         i40e_error(i40e, "failed to set rss key: %d", status);

2021                         return (B_FALSE);
2022                 }
















2023         } else {
2024                 uint_t i;
2025                 for (i = 0; i <= I40E_PFQF_HKEY_MAX_INDEX; i++)


2026                         i40e_write_rx_ctl(hw, I40E_PFQF_HKEY(i), seed[i]);
2027         }
2028 
2029         return (B_TRUE);
2030 }
2031 
2032 /*
2033  * Populate the LUT. The size of each entry in the LUT depends on the controller
2034  * family, with the X722 using a known 7-bit width. On the X710 controller, this
2035  * is programmed through its control registers where as on the X722 this is
2036  * configured through the admin queue. Also of note, the X722 allows the LUT to
2037  * be set on a per-PF or VSI basis. At this time, as we only have a single VSI,
2038  * we use the PF setting as it is the primary VSI.

2039  *
2040  * We populate the LUT in a round robin fashion with the rx queue indices from 0
2041  * to i40e_num_trqpairs - 1.
2042  */
2043 static boolean_t
2044 i40e_config_rss_hlut(i40e_t *i40e, i40e_hw_t *hw)
2045 {
2046         uint32_t *hlut;
2047         uint8_t lut_mask;
2048         uint_t i;
2049         boolean_t ret = B_FALSE;
2050 
2051         /*
2052          * We always configure the PF with a table size of 512 bytes in
2053          * i40e_chip_start().
2054          */
2055         hlut = kmem_alloc(I40E_HLUT_TABLE_SIZE, KM_NOSLEEP);
2056         if (hlut == NULL) {
2057                 i40e_error(i40e, "i40e_config_rss() buffer allocation failed");
2058                 return (B_FALSE);
2059         }
2060 
2061         /*
2062          * The width of the X722 is apparently defined to be 7 bits, regardless
2063          * of the capability.
2064          */
2065         if (i40e_is_x722(i40e)) {
2066                 lut_mask = (1 << 7) - 1;
2067         } else {
2068                 lut_mask = (1 << hw->func_caps.rss_table_entry_width) - 1;
2069         }
2070 
2071         for (i = 0; i < I40E_HLUT_TABLE_SIZE; i++)
2072                 ((uint8_t *)hlut)[i] = (i % i40e->i40e_num_trqpairs) & lut_mask;


2073 
2074         if (i40e_is_x722(i40e)) {
2075                 enum i40e_status_code status;
2076                 status = i40e_aq_set_rss_lut(hw, i40e->i40e_vsi_num, B_TRUE,
2077                     (uint8_t *)hlut, I40E_HLUT_TABLE_SIZE);


2078                 if (status != I40E_SUCCESS) {
2079                         i40e_error(i40e, "failed to set RSS LUT: %d", status);

2080                         goto out;
2081                 }
2082         } else {
2083                 for (i = 0; i < I40E_HLUT_TABLE_SIZE >> 2; i++) {
2084                         I40E_WRITE_REG(hw, I40E_PFQF_HLUT(i), hlut[i]);
2085                 }
2086         }
2087         ret = B_TRUE;
2088 out:
2089         kmem_free(hlut, I40E_HLUT_TABLE_SIZE);
2090         return (ret);
2091 }
2092 
2093 /*
2094  * Set up RSS.
2095  *      1. Seed the hash key.
2096  *      2. Enable PCTYPEs for the hash filter.
2097  *      3. Populate the LUT.
2098  */
2099 static boolean_t


2135         }
2136 
2137         i40e_write_rx_ctl(hw, I40E_PFQF_HENA(0), (uint32_t)hena);
2138         i40e_write_rx_ctl(hw, I40E_PFQF_HENA(1), (uint32_t)(hena >> 32));
2139 
2140         /*
2141          * 3. Populate LUT
2142          */
2143         return (i40e_config_rss_hlut(i40e, hw));
2144 }
2145 
2146 /*
2147  * Wrapper to kick the chipset on.
2148  */
2149 static boolean_t
2150 i40e_chip_start(i40e_t *i40e)
2151 {
2152         i40e_hw_t *hw = &i40e->i40e_hw_space;
2153         struct i40e_filter_control_settings filter;
2154         int rc;

2155 
2156         if (((hw->aq.fw_maj_ver == 4) && (hw->aq.fw_min_ver < 33)) ||
2157             (hw->aq.fw_maj_ver < 4)) {
2158                 i40e_msec_delay(75);
2159                 if (i40e_aq_set_link_restart_an(hw, TRUE, NULL) !=
2160                     I40E_SUCCESS) {
2161                         i40e_error(i40e, "failed to restart link: admin queue "
2162                             "error: %d", hw->aq.asq_last_status);
2163                         return (B_FALSE);
2164                 }
2165         }
2166 
2167         /* Determine hardware state */
2168         i40e_get_hw_state(i40e, hw);
2169 









2170         /* Initialize mac addresses. */
2171         i40e_init_macaddrs(i40e, hw);
2172 
2173         /*
2174          * Set up the filter control. If the hash lut size is changed from
2175          * I40E_HASH_LUT_SIZE_512 then I40E_HLUT_TABLE_SIZE and
2176          * i40e_config_rss_hlut() will need to be updated.
2177          */
2178         bzero(&filter, sizeof (filter));
2179         filter.enable_ethtype = TRUE;
2180         filter.enable_macvlan = TRUE;
2181         filter.hash_lut_size = I40E_HASH_LUT_SIZE_512;
2182 
2183         rc = i40e_set_filter_control(hw, &filter);
2184         if (rc != I40E_SUCCESS) {
2185                 i40e_error(i40e, "i40e_set_filter_control() returned %d", rc);
2186                 return (B_FALSE);
2187         }
2188 
2189         i40e_intr_chip_init(i40e);
2190 
2191         if (!i40e_config_vsi(i40e, hw))


2192                 return (B_FALSE);


2193 






















2194         if (!i40e_config_rss(i40e, hw))
2195                 return (B_FALSE);
2196 
2197         i40e_flush(hw);
2198 
2199         return (B_TRUE);
2200 }
2201 
2202 /*
2203  * Take care of tearing down the rx ring. See 8.3.3.1.2 for more information.
2204  */
2205 static void
2206 i40e_shutdown_rx_rings(i40e_t *i40e)
2207 {
2208         int i;
2209         uint32_t reg;
2210 
2211         i40e_hw_t *hw = &i40e->i40e_hw_space;
2212 
2213         /*


2532         tctx.tphrpacket_ena = I40E_HMC_TX_TPH_DISABLE;
2533         tctx.tphwdesc_ena = I40E_HMC_TX_TPH_DISABLE;
2534         tctx.head_wb_addr = itrq->itrq_desc_area.dmab_dma_address +
2535             sizeof (i40e_tx_desc_t) * itrq->itrq_tx_ring_size;
2536 
2537         /*
2538          * This field isn't actually documented, like crc, but it suggests that
2539          * it should be zeroed. We leave both of these here because of that for
2540          * now. We should check with Intel on why these are here even.
2541          */
2542         tctx.crc = 0;
2543         tctx.rdylist_act = 0;
2544 
2545         /*
2546          * We're supposed to assign the rdylist field with the value of the
2547          * traffic class index for the first device. We query the VSI parameters
2548          * again to get what the handle is. Note that every queue is always
2549          * assigned to traffic class zero, because we don't actually use them.
2550          */
2551         bzero(&context, sizeof (struct i40e_vsi_context));
2552         context.seid = i40e->i40e_vsi_id;
2553         context.pf_num = hw->pf_id;
2554         err = i40e_aq_get_vsi_params(hw, &context, NULL);
2555         if (err != I40E_SUCCESS) {
2556                 i40e_error(i40e, "get VSI params failed with %d", err);
2557                 return (B_FALSE);
2558         }
2559         tctx.rdylist = LE_16(context.info.qs_handle[0]);
2560 
2561         err = i40e_clear_lan_tx_queue_context(hw, itrq->itrq_index);
2562         if (err != I40E_SUCCESS) {
2563                 i40e_error(i40e, "failed to clear tx queue %d context: %d",
2564                     itrq->itrq_index, err);
2565                 return (B_FALSE);
2566         }
2567 
2568         err = i40e_set_lan_tx_queue_context(hw, itrq->itrq_index, &tctx);
2569         if (err != I40E_SUCCESS) {
2570                 i40e_error(i40e, "failed to set tx queue %d context: %d",
2571                     itrq->itrq_index, err);
2572                 return (B_FALSE);


2636                         reg = I40E_READ_REG(hw, I40E_QTX_ENA(i));
2637 
2638                         if (reg & I40E_QTX_ENA_QENA_STAT_MASK)
2639                                 break;
2640                         i40e_msec_delay(I40E_RING_WAIT_PAUSE);
2641                 }
2642 
2643                 if ((reg & I40E_QTX_ENA_QENA_STAT_MASK) == 0) {
2644                         i40e_error(i40e, "failed to enable tx queue %d, timed "
2645                             "out", i);
2646                         return (B_FALSE);
2647                 }
2648         }
2649 
2650         return (B_TRUE);
2651 }
2652 
2653 void
2654 i40e_stop(i40e_t *i40e, boolean_t free_allocations)
2655 {
2656         int i;

2657 
2658         ASSERT(MUTEX_HELD(&i40e->i40e_general_lock));
2659 
2660         /*
2661          * Shutdown and drain the tx and rx pipeline. We do this using the
2662          * following steps.
2663          *
2664          * 1) Shutdown interrupts to all the queues (trying to keep the admin
2665          *    queue alive).
2666          *
2667          * 2) Remove all of the interrupt tx and rx causes by setting the
2668          *    interrupt linked lists to zero.
2669          *
2670          * 2) Shutdown the tx and rx rings. Because i40e_shutdown_rings() should
2671          *    wait for all the queues to be disabled, once we reach that point
2672          *    it should be safe to free associated data.
2673          *
2674          * 4) Wait 50ms after all that is done. This ensures that the rings are
2675          *    ready for programming again and we don't have to think about this
2676          *    in other parts of the driver.
2677          *
2678          * 5) Disable remaining chip interrupts, (admin queue, etc.)
2679          *
2680          * 6) Verify that FM is happy with all the register accesses we
2681          *    performed.
2682          */
2683         i40e_intr_io_disable_all(i40e);
2684         i40e_intr_io_clear_cause(i40e);
2685 
2686         if (i40e_shutdown_rings(i40e) == B_FALSE) {
2687                 ddi_fm_service_impact(i40e->i40e_dip, DDI_SERVICE_LOST);
2688         }
2689 
2690         delay(50 * drv_usectohz(1000));
2691 





















2692         i40e_intr_chip_fini(i40e);
2693 
2694         for (i = 0; i < i40e->i40e_num_trqpairs; i++) {
2695                 mutex_enter(&i40e->i40e_trqpairs[i].itrq_rx_lock);
2696                 mutex_enter(&i40e->i40e_trqpairs[i].itrq_tx_lock);
2697         }
2698 
2699         /*
2700          * We should consider refactoring this to be part of the ring start /
2701          * stop routines at some point.
2702          */
2703         for (i = 0; i < i40e->i40e_num_trqpairs; i++) {
2704                 i40e_stats_trqpair_fini(&i40e->i40e_trqpairs[i]);
2705         }
2706 
2707         if (i40e_check_acc_handle(i40e->i40e_osdep_space.ios_cfg_handle) !=
2708             DDI_FM_OK) {
2709                 ddi_fm_service_impact(i40e->i40e_dip, DDI_SERVICE_LOST);
2710         }
2711 
2712         for (i = 0; i < i40e->i40e_num_trqpairs; i++) {
2713                 i40e_tx_cleanup_ring(&i40e->i40e_trqpairs[i]);
2714         }
2715 
2716         for (i = 0; i < i40e->i40e_num_trqpairs; i++) {
2717                 mutex_exit(&i40e->i40e_trqpairs[i].itrq_rx_lock);
2718                 mutex_exit(&i40e->i40e_trqpairs[i].itrq_tx_lock);
2719         }
2720 
2721         i40e_stat_vsi_fini(i40e);


2722 
2723         i40e->i40e_link_speed = 0;
2724         i40e->i40e_link_duplex = 0;
2725         i40e_link_state_set(i40e, LINK_STATE_UNKNOWN);
2726 
2727         if (free_allocations) {
2728                 i40e_free_ring_mem(i40e, B_FALSE);
2729         }
2730 }
2731 
2732 boolean_t
2733 i40e_start(i40e_t *i40e, boolean_t alloc)
2734 {
2735         i40e_hw_t *hw = &i40e->i40e_hw_space;
2736         boolean_t rc = B_TRUE;
2737         int i, err;
2738 
2739         ASSERT(MUTEX_HELD(&i40e->i40e_general_lock));
2740 
2741         if (alloc) {


2766         if (!i40e_chip_start(i40e)) {
2767                 i40e_fm_ereport(i40e, DDI_FM_DEVICE_INVAL_STATE);
2768                 rc = B_FALSE;
2769                 goto done;
2770         }
2771 
2772         if (i40e_setup_rx_rings(i40e) == B_FALSE) {
2773                 rc = B_FALSE;
2774                 goto done;
2775         }
2776 
2777         if (i40e_setup_tx_rings(i40e) == B_FALSE) {
2778                 rc = B_FALSE;
2779                 goto done;
2780         }
2781 
2782         /*
2783          * Enable broadcast traffic; however, do not enable multicast traffic.
2784          * That's handle exclusively through MAC's mc_multicst routines.
2785          */
2786         err = i40e_aq_set_vsi_broadcast(hw, i40e->i40e_vsi_id, B_TRUE, NULL);

2787         if (err != I40E_SUCCESS) {
2788                 i40e_error(i40e, "failed to set default VSI: %d", err);
2789                 rc = B_FALSE;
2790                 goto done;
2791         }
2792 
2793         err = i40e_aq_set_mac_config(hw, i40e->i40e_frame_max, B_TRUE, 0, NULL);
2794         if (err != I40E_SUCCESS) {
2795                 i40e_error(i40e, "failed to set MAC config: %d", err);
2796                 rc = B_FALSE;
2797                 goto done;
2798         }
2799 
2800         /*
2801          * Finally, make sure that we're happy from an FM perspective.
2802          */
2803         if (i40e_check_acc_handle(i40e->i40e_osdep_space.ios_reg_handle) !=
2804             DDI_FM_OK) {
2805                 rc = B_FALSE;
2806                 goto done;


   1 /*
   2  * This file and its contents are supplied under the terms of the
   3  * Common Development and Distribution License ("CDDL"), version 1.0.
   4  * You may only use this file in accordance with the terms of version
   5  * 1.0 of the CDDL.
   6  *
   7  * A full copy of the text of the CDDL should have accompanied this
   8  * source.  A copy of the CDDL is also available via the Internet at
   9  * http://www.illumos.org/license/CDDL.
  10  */
  11 
  12 /*
  13  * Copyright 2015 OmniTI Computer Consulting, Inc. All rights reserved.
  14  * Copyright 2019 Joyent, Inc.
  15  * Copyright 2017 Tegile Systems, Inc.  All rights reserved.
  16  */
  17 
  18 /*
  19  * i40e - Intel 10/40 Gb Ethernet driver
  20  *
  21  * The i40e driver is the main software device driver for the Intel 40 Gb family
  22  * of devices. Note that these devices come in many flavors with both 40 GbE
  23  * ports and 10 GbE ports. This device is the successor to the 82599 family of
  24  * devices (ixgbe).
  25  *
  26  * Unlike previous generations of Intel 1 GbE and 10 GbE devices, the 40 GbE
  27  * devices defined in the XL710 controller (previously known as Fortville) are a
  28  * rather different beast and have a small switch embedded inside of them. In
  29  * addition, the way that most of the programming is done has been overhauled.
  30  * As opposed to just using PCIe memory mapped registers, it also has an
  31  * administrative queue which is used to communicate with firmware running on
  32  * the chip.
  33  *
  34  * Each physical function in the hardware shows up as a device that this driver


 120  * determining which functions belong to the same device. Nominally one might do
 121  * this by having a nexus driver; however, a prime requirement for a nexus
 122  * driver is identifying the various children and activating them. While it is
 123  * possible to get this information from NVRAM, we would end up duplicating a
 124  * lot of the PCI enumeration logic. Really, at the end of the day, the device
 125  * doesn't give us the traditional identification properties we want from a
 126  * nexus driver.
 127  *
 128  * Instead, we rely on some properties that are guaranteed to be unique. While
 129  * it might be tempting to leverage the PBA or serial number of the device from
 130  * NVRAM, there is nothing that says that two devices can't be mis-programmed to
 131  * have the same values in NVRAM. Instead, we uniquely identify a group of
 132  * functions based on their parent in the /devices tree, their PCI bus and PCI
 133  * function identifiers. Using either on their own may not be sufficient.
 134  *
 135  * For each unique PCI device that we encounter, we'll create a i40e_device_t.
 136  * From there, because we don't have a good way to tell the GLDv3 about sharing
 137  * resources between everything, we'll end up just dividing the resources
 138  * evenly between all of the functions. Longer term, if we don't have to declare
 139  * to the GLDv3 that these resources are shared, then we'll maintain a pool and
 140  * have each PF allocate from the pool in the device, thus if only two of four
 141  * ports are being used, for example, then all of the resources can still be
 142  * used.
 143  *
 144  * -------------------------------------------
 145  * Transmit and Receive Queue Pair Allocations
 146  * -------------------------------------------
 147  *
 148  * NVRAM ends up assigning each PF its own share of the transmit and receive LAN
 149  * queue pairs, we have no way of modifying it, only observing it. From there,
 150  * it's up to us to map these queues to VSIs and VFs. Since we don't support any
 151  * VFs at this time, we only focus on assignments to VSIs.
 152  *
 153  * At the moment, we used a static mapping of transmit/receive queue pairs to a
 154  * given VSI (eg. rings to a group). Though in the fullness of time, we want to
 155  * make this something which is fully dynamic and take advantage of documented,
 156  * but not yet available functionality for adding filters based on VXLAN and
 157  * other encapsulation technologies.
 158  *
 159  * -------------------------------------
 160  * Broadcast, Multicast, and Promiscuous
 161  * -------------------------------------
 162  *
 163  * As part of the GLDv3, we need to make sure that we can handle receiving
 164  * broadcast and multicast traffic. As well as enabling promiscuous mode when
 165  * requested. GLDv3 requires that all broadcast and multicast traffic be
 166  * retrieved by the default group, eg. the first one. This is the same thing as
 167  * the default VSI.
 168  *
 169  * To receieve broadcast traffic, we enable it through the admin queue, rather
 170  * than use one of our filters for it. For multicast traffic, we reserve a
 171  * certain number of the hash filters and assign them to a given PF. When we
 172  * exceed those, we then switch to using promiscuous mode for multicast traffic.
 173  *
 174  * More specifically, once we exceed the number of filters (indicated because
 175  * the i40e_t`i40e_resources.ifr_nmcastfilt ==
 176  * i40e_t`i40e_resources.ifr_nmcastfilt_used), we then instead need to toggle
 177  * promiscuous mode. If promiscuous mode is toggled then we keep track of the
 178  * number of MACs added to it by incrementing i40e_t`i40e_mcast_promisc_count.
 179  * That will stay enabled until that count reaches zero indicating that we have
 180  * only added multicast addresses that we have a corresponding entry for.
 181  *
 182  * Because MAC itself wants to toggle promiscuous mode, which includes both
 183  * unicast and multicast traffic, we go through and keep track of that
 184  * ourselves. That is maintained through the use of the i40e_t`i40e_promisc_on
 185  * member.
 186  *
 187  * --------------
 188  * VSI Management
 189  * --------------
 190  *
 191  * The PFs share 384 VSIs. The firmware creates one VSI per PF by default.
 192  * During chip start we retrieve the SEID of this VSI and assign it as the
 193  * default VSI for our VEB (one VEB per PF). We then add additional VSIs to
 194  * the VEB up to the determined number of rx groups: i40e_t`i40e_num_rx_groups.
 195  * We currently cap this number to I40E_GROUP_MAX to a) make sure all PFs can
 196  * allocate the same number of VSIs, and b) to keep the interrupt multiplexing
 197  * under control. In the future, when we improve the interrupt allocation, we
 198  * may want to revisit this cap to make better use of the available VSIs. The
 199  * VSI allocation and configuration can be found in i40e_chip_start().
 200  *
 201  * ----------------
 202  * Structure Layout
 203  * ----------------
 204  *
 205  * The following images relates the core data structures together. The primary
 206  * structure in the system is the i40e_t. It itself contains multiple rings,
 207  * i40e_trqpair_t's which contain the various transmit and receive data. The
 208  * receive data is stored outside of the i40e_trqpair_t and instead in the
 209  * i40e_rx_data_t. The i40e_t has a corresponding i40e_device_t which keeps
 210  * track of per-physical device state. Finally, for every active descriptor,
 211  * there is a corresponding control block, which is where the
 212  * i40e_rx_control_block_t and the i40e_tx_control_block_t come from.
 213  *
 214  *   +-----------------------+       +-----------------------+
 215  *   | Global i40e_t list    |       | Global Device list    |
 216  *   |                       |    +--|                       |
 217  *   | i40e_glist            |    |  | i40e_dlist            |
 218  *   +-----------------------+    |  +-----------------------+
 219  *       |                        v


 224  *       |      | dev_info_t *     ------+--> Parent in devices tree.
 225  *       |      | uint_t           ------+--> PCI bus number
 226  *       |      | uint_t           ------+--> PCI device number
 227  *       |      | uint_t           ------+--> Number of functions
 228  *       |      | i40e_switch_rsrcs_t ---+--> Captured total switch resources
 229  *       |      | list_t           ------+-------------+
 230  *       |      +------------------------+             |
 231  *       |                           ^                 |
 232  *       |                           +--------+        |
 233  *       |                                    |        v
 234  *       |  +---------------------------+     |   +-------------------+
 235  *       +->| GLDv3 Device, per PF      |-----|-->| GLDv3 Device (PF) |--> ...
 236  *          | i40e_t                    |     |   | i40e_t            |
 237  *          | **Primary Structure**     |     |   +-------------------+
 238  *          |                           |     |
 239  *          | i40e_device_t *         --+-----+
 240  *          | i40e_state_t            --+---> Device State
 241  *          | i40e_hw_t               --+---> Intel common code structure
 242  *          | mac_handle_t            --+---> GLDv3 handle to MAC
 243  *          | ddi_periodic_t          --+---> Link activity timer
 244  *          | i40e_vsi_t *            --+---> Array of VSIs
 245  *          | i40e_func_rsrc_t        --+---> Available hardware resources
 246  *          | i40e_switch_rsrc_t *    --+---> Switch resource snapshot
 247  *          | i40e_sdu                --+---> Current MTU
 248  *          | i40e_frame_max          --+---> Current HW frame size
 249  *          | i40e_uaddr_t *          --+---> Array of assigned unicast MACs
 250  *          | i40e_maddr_t *          --+---> Array of assigned multicast MACs
 251  *          | i40e_mcast_promisccount --+---> Active multicast state
 252  *          | i40e_promisc_on         --+---> Current promiscuous mode state
 253  *          | uint_t                  --+---> Number of transmit/receive pairs
 254  *          | i40e_rx_group_t *       --+---> Array of Rx groups
 255  *          | kstat_t *               --+---> PF kstats

 256  *          | i40e_pf_stats_t         --+---> PF kstat backing data

 257  *          | i40e_trqpair_t *        --+---------+
 258  *          +---------------------------+         |
 259  *                                                |
 260  *                                                v
 261  *  +-------------------------------+       +-----------------------------+
 262  *  | Transmit/Receive Queue Pair   |-------| Transmit/Receive Queue Pair |->...
 263  *  | i40e_trqpair_t                |       | i40e_trqpair_t              |
 264  *  + Ring Data Structure           |       +-----------------------------+
 265  *  |                               |
 266  *  | mac_ring_handle_t             +--> MAC RX ring handle
 267  *  | mac_ring_handle_t             +--> MAC TX ring handle
 268  *  | i40e_rxq_stat_t             --+--> RX Queue stats
 269  *  | i40e_txq_stat_t             --+--> TX Queue stats
 270  *  | uint32_t (tx ring size)       +--> TX Ring Size
 271  *  | uint32_t (tx free list size)  +--> TX Free List Size
 272  *  | i40e_dma_buffer_t     --------+--> TX Descriptor ring DMA
 273  *  | i40e_tx_desc_t *      --------+--> TX descriptor ring
 274  *  | volatile unt32_t *            +--> TX Write back head
 275  *  | uint32_t               -------+--> TX ring head
 276  *  | uint32_t               -------+--> TX ring tail


 342  *
 343  * 2) When grabbing locks between multiple transmit and receive queues, the
 344  * locks for the lowest number transmit/receive queue should be grabbed first.
 345  *
 346  * 3) When grabbing both the transmit and receive lock for a given queue, always
 347  * grab i40e_trqpair_t`itrq_rx_lock before the i40e_trqpair_t`itrq_tx_lock.
 348  *
 349  * 4) The following pairs of locks are not expected to be held at the same time:
 350  *
 351  * o i40e_t`i40e_rx_pending_lock and i40e_trqpair_t`itrq_tcb_lock
 352  *
 353  * -----------
 354  * Future Work
 355  * -----------
 356  *
 357  * At the moment the i40e_t driver is rather bare bones, allowing us to start
 358  * getting data flowing and folks using it while we develop additional features.
 359  * While bugs have been filed to cover this future work, the following gives an
 360  * overview of expected work:
 361  *


 362  *  o DMA binding and breaking up the locking in ring recycling.
 363  *  o Enhanced detection of device errors
 364  *  o Participation in IRM
 365  *  o FMA device reset
 366  *  o Stall detection, temperature error detection, etc.
 367  *  o More dynamic resource pools
 368  */
 369 
 370 #include "i40e_sw.h"
 371 
 372 static char i40e_ident[] = "Intel 10/40Gb Ethernet v1.0.3";
 373 
 374 /*
 375  * The i40e_glock primarily protects the lists below and the i40e_device_t
 376  * structures.
 377  */
 378 static kmutex_t i40e_glock;
 379 static list_t i40e_glist;
 380 static list_t i40e_dlist;
 381 
 382 /*
 383  * Access attributes for register mapping.
 384  */
 385 static ddi_device_acc_attr_t i40e_regs_acc_attr = {
 386         DDI_DEVICE_ATTR_V1,
 387         DDI_STRUCTURE_LE_ACC,
 388         DDI_STRICTORDER_ACC,
 389         DDI_FLAGERR_ACC
 390 };
 391 
 392 /*


 742 
 743                 ddi_fm_fini(i40e->i40e_dip);
 744         }
 745 }
 746 
 747 void
 748 i40e_fm_ereport(i40e_t *i40e, char *detail)
 749 {
 750         uint64_t ena;
 751         char buf[FM_MAX_CLASS];
 752 
 753         (void) snprintf(buf, FM_MAX_CLASS, "%s.%s", DDI_FM_DEVICE, detail);
 754         ena = fm_ena_generate(0, FM_ENA_FMT1);
 755         if (DDI_FM_EREPORT_CAP(i40e->i40e_fm_capabilities)) {
 756                 ddi_fm_ereport_post(i40e->i40e_dip, buf, ena, DDI_NOSLEEP,
 757                     FM_VERSION, DATA_TYPE_UINT8, FM_EREPORT_VERS0, NULL);
 758         }
 759 }
 760 
 761 /*
 762  * Here we're trying to set the SEID of the default VSI. In general,
 763  * when we come through and look at this shortly after attach, we
 764  * expect there to only be a single element present, which is the
 765  * default VSI. Importantly, each PF seems to not see any other
 766  * devices, in part because of the simple switch mode that we're
 767  * using. If for some reason, we see more artifacts, we'll need to
 768  * revisit what we're doing here.
 769  */
 770 static boolean_t
 771 i40e_set_def_vsi_seid(i40e_t *i40e)
 772 {
 773         i40e_hw_t *hw = &i40e->i40e_hw_space;
 774         struct i40e_aqc_get_switch_config_resp *sw_config;
 775         uint8_t aq_buf[I40E_AQ_LARGE_BUF];
 776         uint16_t next = 0;
 777         int rc;
 778 
 779         /* LINTED: E_BAD_PTR_CAST_ALIGN */
 780         sw_config = (struct i40e_aqc_get_switch_config_resp *)aq_buf;
 781         rc = i40e_aq_get_switch_config(hw, sw_config, sizeof (aq_buf), &next,
 782             NULL);
 783         if (rc != I40E_SUCCESS) {
 784                 i40e_error(i40e, "i40e_aq_get_switch_config() failed %d: %d",
 785                     rc, hw->aq.asq_last_status);
 786                 return (B_FALSE);
 787         }
 788 
 789         if (LE_16(sw_config->header.num_reported) != 1) {
 790                 i40e_error(i40e, "encountered multiple (%d) switching units "
 791                     "during attach, not proceeding",
 792                     LE_16(sw_config->header.num_reported));
 793                 return (B_FALSE);
 794         }
 795 
 796         I40E_DEF_VSI_SEID(i40e) = sw_config->element[0].seid;
 797         return (B_TRUE);
 798 }
 799 
 800 /*
 801  * Get the SEID of the uplink MAC.
 802  */
 803 static int
 804 i40e_get_mac_seid(i40e_t *i40e)
 805 {
 806         i40e_hw_t *hw = &i40e->i40e_hw_space;
 807         struct i40e_aqc_get_switch_config_resp *sw_config;
 808         uint8_t aq_buf[I40E_AQ_LARGE_BUF];
 809         uint16_t next = 0;
 810         int rc;
 811 
 812         /* LINTED: E_BAD_PTR_CAST_ALIGN */
 813         sw_config = (struct i40e_aqc_get_switch_config_resp *)aq_buf;
 814         rc = i40e_aq_get_switch_config(hw, sw_config, sizeof (aq_buf), &next,
 815             NULL);
 816         if (rc != I40E_SUCCESS) {
 817                 i40e_error(i40e, "i40e_aq_get_switch_config() failed %d: %d",
 818                     rc, hw->aq.asq_last_status);
 819                 return (-1);
 820         }
 821 
 822         return (LE_16(sw_config->element[0].uplink_seid));
 823 }
 824 
 825 /*
 826  * We need to fill the i40e_hw_t structure with the capabilities of this PF. We
 827  * must also provide the memory for it; however, we don't need to keep it around
 828  * to the call to the common code. It takes it and parses it into an internal
 829  * structure.
 830  */
 831 static boolean_t
 832 i40e_get_hw_capabilities(i40e_t *i40e, i40e_hw_t *hw)
 833 {
 834         struct i40e_aqc_list_capabilities_element_resp *buf;
 835         int rc;
 836         size_t len;
 837         uint16_t needed;
 838         int nelems = I40E_HW_CAP_DEFAULT;
 839 
 840         len = nelems * sizeof (*buf);
 841 
 842         for (;;) {


1106                 for (i = 0; i < i40e->i40e_intr_count; i++) {
1107                         rc = ddi_intr_disable(i40e->i40e_intr_handles[i]);
1108                         if (rc != DDI_SUCCESS) {
1109                                 i40e_error(i40e,
1110                                     "Failed to disable interrupt %d: %d",
1111                                     i, rc);
1112                                 return (B_FALSE);
1113                         }
1114                 }
1115         }
1116 
1117         return (B_TRUE);
1118 }
1119 
1120 /*
1121  * Free receive & transmit rings.
1122  */
1123 static void
1124 i40e_free_trqpairs(i40e_t *i40e)
1125 {

1126         i40e_trqpair_t *itrq;
1127 
1128         if (i40e->i40e_rx_groups != NULL) {
1129                 kmem_free(i40e->i40e_rx_groups,
1130                     sizeof (i40e_rx_group_t) * i40e->i40e_num_rx_groups);
1131                 i40e->i40e_rx_groups = NULL;
1132         }
1133 
1134         if (i40e->i40e_trqpairs != NULL) {
1135                 for (uint_t i = 0; i < i40e->i40e_num_trqpairs; i++) {
1136                         itrq = &i40e->i40e_trqpairs[i];
1137                         mutex_destroy(&itrq->itrq_rx_lock);
1138                         mutex_destroy(&itrq->itrq_tx_lock);
1139                         mutex_destroy(&itrq->itrq_tcb_lock);
1140 
1141                         /*
1142                          * Should have already been cleaned up by start/stop,
1143                          * etc.
1144                          */
1145                         ASSERT(itrq->itrq_txkstat == NULL);
1146                         ASSERT(itrq->itrq_rxkstat == NULL);
1147                 }
1148 
1149                 kmem_free(i40e->i40e_trqpairs,
1150                     sizeof (i40e_trqpair_t) * i40e->i40e_num_trqpairs);
1151                 i40e->i40e_trqpairs = NULL;
1152         }
1153 
1154         cv_destroy(&i40e->i40e_rx_pending_cv);
1155         mutex_destroy(&i40e->i40e_rx_pending_lock);
1156         mutex_destroy(&i40e->i40e_general_lock);
1157 }
1158 
1159 /*
1160  * Allocate transmit and receive rings, as well as other data structures that we
1161  * need.
1162  */
1163 static boolean_t
1164 i40e_alloc_trqpairs(i40e_t *i40e)
1165 {

1166         void *mutexpri = DDI_INTR_PRI(i40e->i40e_intr_pri);
1167 
1168         /*
1169          * Now that we have the priority for the interrupts, initialize
1170          * all relevant locks.
1171          */
1172         mutex_init(&i40e->i40e_general_lock, NULL, MUTEX_DRIVER, mutexpri);
1173         mutex_init(&i40e->i40e_rx_pending_lock, NULL, MUTEX_DRIVER, mutexpri);
1174         cv_init(&i40e->i40e_rx_pending_cv, NULL, CV_DRIVER, NULL);
1175 
1176         i40e->i40e_trqpairs = kmem_zalloc(sizeof (i40e_trqpair_t) *
1177             i40e->i40e_num_trqpairs, KM_SLEEP);
1178         for (uint_t i = 0; i < i40e->i40e_num_trqpairs; i++) {
1179                 i40e_trqpair_t *itrq = &i40e->i40e_trqpairs[i];
1180 
1181                 itrq->itrq_i40e = i40e;
1182                 mutex_init(&itrq->itrq_rx_lock, NULL, MUTEX_DRIVER, mutexpri);
1183                 mutex_init(&itrq->itrq_tx_lock, NULL, MUTEX_DRIVER, mutexpri);
1184                 mutex_init(&itrq->itrq_tcb_lock, NULL, MUTEX_DRIVER, mutexpri);
1185                 itrq->itrq_index = i;
1186         }
1187 
1188         i40e->i40e_rx_groups = kmem_zalloc(sizeof (i40e_rx_group_t) *
1189             i40e->i40e_num_rx_groups, KM_SLEEP);
1190 
1191         for (uint_t i = 0; i < i40e->i40e_num_rx_groups; i++) {
1192                 i40e_rx_group_t *rxg = &i40e->i40e_rx_groups[i];
1193 
1194                 rxg->irg_index = i;
1195                 rxg->irg_i40e = i40e;
1196         }
1197 
1198         return (B_TRUE);
1199 }
1200 
1201 
1202 
1203 /*
1204  * Unless a .conf file already overrode i40e_t structure values, they will
1205  * be 0, and need to be set in conjunction with the now-available HW report.



1206  */
1207 /* ARGSUSED */
1208 static void
1209 i40e_hw_to_instance(i40e_t *i40e, i40e_hw_t *hw)
1210 {
1211         if (i40e->i40e_num_trqpairs_per_vsi == 0) {
1212                 if (i40e_is_x722(i40e)) {
1213                         i40e->i40e_num_trqpairs_per_vsi =
1214                             I40E_722_MAX_TC_QUEUES;
1215                 } else {
1216                         i40e->i40e_num_trqpairs_per_vsi =
1217                             I40E_710_MAX_TC_QUEUES;
1218                 }
1219         }
1220 
1221         if (i40e->i40e_num_rx_groups == 0) {
1222                 i40e->i40e_num_rx_groups = I40E_GROUP_MAX;
1223         }
1224 }
1225 
1226 /*
1227  * Free any resources required by, or setup by, the Intel common code.
1228  */
1229 static void
1230 i40e_common_code_fini(i40e_t *i40e)
1231 {
1232         i40e_hw_t *hw = &i40e->i40e_hw_space;
1233         int rc;
1234 
1235         rc = i40e_shutdown_lan_hmc(hw);
1236         if (rc != I40E_SUCCESS)
1237                 i40e_error(i40e, "failed to shutdown LAN hmc: %d", rc);
1238 
1239         rc = i40e_shutdown_adminq(hw);


1334                 i40e_error(i40e, "failed to retrieve hardware mac address: %d",
1335                     rc);
1336                 return (B_FALSE);
1337         }
1338 
1339         rc = i40e_validate_mac_addr(hw->mac.addr);
1340         if (rc != 0) {
1341                 i40e_error(i40e, "failed to validate internal mac address: "
1342                     "%d", rc);
1343                 return (B_FALSE);
1344         }
1345         bcopy(hw->mac.addr, hw->mac.perm_addr, ETHERADDRL);
1346         if ((rc = i40e_get_port_mac_addr(hw, hw->mac.port_addr)) !=
1347             I40E_SUCCESS) {
1348                 i40e_error(i40e, "failed to retrieve port mac address: %d",
1349                     rc);
1350                 return (B_FALSE);
1351         }
1352 
1353         /*
1354          * We need to obtain the Default Virtual Station SEID (VSI)
1355          * before we can perform other operations on the device.
1356          */
1357         if (!i40e_set_def_vsi_seid(i40e)) {
1358                 i40e_error(i40e, "failed to obtain Default VSI SEID");

1359                 return (B_FALSE);
1360         }
1361 
1362         return (B_TRUE);
1363 }
1364 
1365 static void
1366 i40e_unconfigure(dev_info_t *devinfo, i40e_t *i40e)
1367 {
1368         int rc;
1369 
1370         if (i40e->i40e_attach_progress & I40E_ATTACH_ENABLE_INTR)
1371                 (void) i40e_disable_interrupts(i40e);
1372 
1373         if ((i40e->i40e_attach_progress & I40E_ATTACH_LINK_TIMER) &&
1374             i40e->i40e_periodic_id != 0) {
1375                 ddi_periodic_delete(i40e->i40e_periodic_id);
1376                 i40e->i40e_periodic_id = 0;
1377         }
1378 


1583         i40e->i40e_tx_block_thresh = i40e_get_prop(i40e, "tx_resched_threshold",
1584             I40E_MIN_TX_BLOCK_THRESH,
1585             i40e->i40e_tx_ring_size - I40E_TX_MAX_COOKIE,
1586             I40E_DEF_TX_BLOCK_THRESH);
1587 
1588         i40e->i40e_rx_ring_size = i40e_get_prop(i40e, "rx_ring_size",
1589             I40E_MIN_RX_RING_SIZE, I40E_MAX_RX_RING_SIZE,
1590             I40E_DEF_RX_RING_SIZE);
1591         if ((i40e->i40e_rx_ring_size % I40E_DESC_ALIGN) != 0) {
1592                 i40e->i40e_rx_ring_size = P2ROUNDUP(i40e->i40e_rx_ring_size,
1593                     I40E_DESC_ALIGN);
1594         }
1595 
1596         i40e->i40e_rx_limit_per_intr = i40e_get_prop(i40e, "rx_limit_per_intr",
1597             I40E_MIN_RX_LIMIT_PER_INTR, I40E_MAX_RX_LIMIT_PER_INTR,
1598             I40E_DEF_RX_LIMIT_PER_INTR);
1599 
1600         i40e->i40e_tx_hcksum_enable = i40e_get_prop(i40e, "tx_hcksum_enable",
1601             B_FALSE, B_TRUE, B_TRUE);
1602 
1603         i40e->i40e_tx_lso_enable = i40e_get_prop(i40e, "tx_lso_enable",
1604             B_FALSE, B_TRUE, B_TRUE);
1605 
1606         i40e->i40e_rx_hcksum_enable = i40e_get_prop(i40e, "rx_hcksum_enable",
1607             B_FALSE, B_TRUE, B_TRUE);
1608 
1609         i40e->i40e_rx_dma_min = i40e_get_prop(i40e, "rx_dma_threshold",
1610             I40E_MIN_RX_DMA_THRESH, I40E_MAX_RX_DMA_THRESH,
1611             I40E_DEF_RX_DMA_THRESH);
1612 
1613         i40e->i40e_tx_dma_min = i40e_get_prop(i40e, "tx_dma_threshold",
1614             I40E_MIN_TX_DMA_THRESH, I40E_MAX_TX_DMA_THRESH,
1615             I40E_DEF_TX_DMA_THRESH);
1616 
1617         i40e->i40e_tx_itr = i40e_get_prop(i40e, "tx_intr_throttle",
1618             I40E_MIN_ITR, I40E_MAX_ITR, I40E_DEF_TX_ITR);
1619 
1620         i40e->i40e_rx_itr = i40e_get_prop(i40e, "rx_intr_throttle",
1621             I40E_MIN_ITR, I40E_MAX_ITR, I40E_DEF_RX_ITR);
1622 
1623         i40e->i40e_other_itr = i40e_get_prop(i40e, "other_intr_throttle",
1624             I40E_MIN_ITR, I40E_MAX_ITR, I40E_DEF_OTHER_ITR);
1625 


1755 static boolean_t
1756 i40e_alloc_intrs(i40e_t *i40e, dev_info_t *devinfo)
1757 {
1758         int intr_types, rc;
1759         uint_t max_trqpairs;
1760 
1761         if (i40e_is_x722(i40e)) {
1762                 max_trqpairs = I40E_722_MAX_TC_QUEUES;
1763         } else {
1764                 max_trqpairs = I40E_710_MAX_TC_QUEUES;
1765         }
1766 
1767         rc = ddi_intr_get_supported_types(devinfo, &intr_types);
1768         if (rc != DDI_SUCCESS) {
1769                 i40e_error(i40e, "failed to get supported interrupt types: %d",
1770                     rc);
1771                 return (B_FALSE);
1772         }
1773 
1774         i40e->i40e_intr_type = 0;
1775         i40e->i40e_num_rx_groups = I40E_GROUP_MAX;
1776 
1777         /*
1778          * We need to determine the number of queue pairs per traffic
1779          * class. We only have one traffic class (TC0), so we'll base
1780          * this off the number of interrupts provided. Furthermore,
1781          * since we only use one traffic class, the number of queues
1782          * per traffic class and per VSI are the same.
1783          */
1784         if ((intr_types & DDI_INTR_TYPE_MSIX) &&
1785             (i40e->i40e_intr_force <= I40E_INTR_MSIX) &&
1786             (i40e_alloc_intr_handles(i40e, devinfo, DDI_INTR_TYPE_MSIX))) {
1787                 uint32_t n;
1788 
1789                 /*
1790                  * While we want the number of queue pairs to match
1791                  * the number of interrupts, we must keep stay in
1792                  * bounds of the maximum number of queues per traffic
1793                  * class. We subtract one from i40e_intr_count to
1794                  * account for interrupt zero; which is currently
1795                  * restricted to admin queue commands and other
1796                  * interrupt causes.
1797                  */
1798                 n = MIN(i40e->i40e_intr_count - 1, max_trqpairs);
1799                 ASSERT3U(n, >, 0);
1800 
1801                 /*
1802                  * Round up to the nearest power of two to ensure that
1803                  * the QBASE aligns with the TC size which must be
1804                  * programmed as a power of two. See the queue mapping
1805                  * description in section 7.4.9.5.5.1.
1806                  *
1807                  * If i40e_intr_count - 1 is not a power of two then
1808                  * some queue pairs on the same VSI will have to share
1809                  * an interrupt.
1810                  *
1811                  * We may want to revisit this logic in a future where
1812                  * we have more interrupts and more VSIs. Otherwise,
1813                  * each VSI will use as many interrupts as possible.
1814                  * Using more QPs per VSI means better RSS for each
1815                  * group, but at the same time may require more
1816                  * sharing of interrupts across VSIs. This may be a
1817                  * good candidate for a .conf tunable.
1818                  */
1819                 n = 0x1 << ddi_fls(n);
1820                 i40e->i40e_num_trqpairs_per_vsi = n;
1821                 ASSERT3U(i40e->i40e_num_rx_groups, >, 0);
1822                 i40e->i40e_num_trqpairs = i40e->i40e_num_trqpairs_per_vsi *
1823                     i40e->i40e_num_rx_groups;
1824                 return (B_TRUE);
1825         }

1826 
1827         /*
1828          * We only use multiple transmit/receive pairs when MSI-X interrupts are
1829          * available due to the fact that the device basically only supports a
1830          * single MSI interrupt.
1831          */
1832         i40e->i40e_num_trqpairs = I40E_TRQPAIR_NOMSIX;
1833         i40e->i40e_num_trqpairs_per_vsi = i40e->i40e_num_trqpairs;
1834         i40e->i40e_num_rx_groups = I40E_GROUP_NOMSIX;
1835 
1836         if ((intr_types & DDI_INTR_TYPE_MSI) &&
1837             (i40e->i40e_intr_force <= I40E_INTR_MSI)) {
1838                 if (i40e_alloc_intr_handles(i40e, devinfo, DDI_INTR_TYPE_MSI))
1839                         return (B_TRUE);
1840         }
1841 
1842         if (intr_types & DDI_INTR_TYPE_FIXED) {
1843                 if (i40e_alloc_intr_handles(i40e, devinfo, DDI_INTR_TYPE_FIXED))
1844                         return (B_TRUE);
1845         }
1846 
1847         return (B_FALSE);
1848 }
1849 
1850 /*
1851  * Map different interrupts to MSI-X vectors.
1852  */
1853 static boolean_t
1854 i40e_map_intrs_to_vectors(i40e_t *i40e)
1855 {


1856         if (i40e->i40e_intr_type != DDI_INTR_TYPE_MSIX) {
1857                 return (B_TRUE);
1858         }
1859 
1860         /*
1861          * Each queue pair is mapped to a single interrupt, so
1862          * transmit and receive interrupts for a given queue share the
1863          * same vector. Vector zero is reserved for the admin queue.


1864          */
1865         for (uint_t i = 0; i < i40e->i40e_num_trqpairs; i++) {
1866                 uint_t vector = i % (i40e->i40e_intr_count - 1);
1867 
1868                 i40e->i40e_trqpairs[i].itrq_rx_intrvec = vector + 1;
1869                 i40e->i40e_trqpairs[i].itrq_tx_intrvec = vector + 1;

1870         }
1871 
1872         return (B_TRUE);
1873 }
1874 
1875 static boolean_t
1876 i40e_add_intr_handlers(i40e_t *i40e)
1877 {
1878         int rc, vector;
1879 
1880         switch (i40e->i40e_intr_type) {
1881         case DDI_INTR_TYPE_MSIX:
1882                 for (vector = 0; vector < i40e->i40e_intr_count; vector++) {
1883                         rc = ddi_intr_add_handler(
1884                             i40e->i40e_intr_handles[vector],
1885                             (ddi_intr_handler_t *)i40e_intr_msix, i40e,
1886                             (void *)(uintptr_t)vector);
1887                         if (rc != DDI_SUCCESS) {
1888                                 i40e_log(i40e, "Add interrupt handler (MSI-X) "
1889                                     "failed: return %d, vector %d", rc, vector);


1988         rc = i40e_aq_set_phy_int_mask(hw, 0, NULL);
1989         if (rc != I40E_SUCCESS) {
1990                 i40e_error(i40e, "failed to update phy link mask: %d", rc);
1991         }
1992 }
1993 
1994 /*
1995  * Go through and re-initialize any existing filters that we may have set up for
1996  * this device. Note that we would only expect them to exist if hardware had
1997  * already been initialized and we had just reset it. While we're not
1998  * implementing this yet, we're keeping this around for when we add reset
1999  * capabilities, so this isn't forgotten.
2000  */
2001 /* ARGSUSED */
2002 static void
2003 i40e_init_macaddrs(i40e_t *i40e, i40e_hw_t *hw)
2004 {
2005 }
2006 
2007 /*
2008  * Set the properties which have common values across all the VSIs.
2009  * Consult the "Add VSI" command section (7.4.9.5.5.1) for a
2010  * complete description of these properties.
2011  */
2012 static void
2013 i40e_set_shared_vsi_props(i40e_t *i40e,
2014     struct i40e_aqc_vsi_properties_data *info, uint_t vsi_idx)
2015 {
2016         uint_t tc_queues;
2017         uint16_t vsi_qp_base;
2018 
2019         /*
2020          * It's important that we use bitwise-OR here; callers to this
2021          * function might enable other sections before calling this
2022          * function.
2023          */
2024         info->valid_sections |= LE_16(I40E_AQ_VSI_PROP_QUEUE_MAP_VALID |
2025             I40E_AQ_VSI_PROP_VLAN_VALID);

2026 


2027         /*
2028          * Calculate the starting QP index for this VSI. This base is
2029          * relative to the PF queue space; so a value of 0 for PF#1
2030          * represents the absolute index PFLAN_QALLOC_FIRSTQ for PF#1.
2031          */
2032         vsi_qp_base = vsi_idx * i40e->i40e_num_trqpairs_per_vsi;
2033         info->mapping_flags = LE_16(I40E_AQ_VSI_QUE_MAP_CONTIG);
2034         info->queue_mapping[0] =
2035             LE_16((vsi_qp_base << I40E_AQ_VSI_QUEUE_SHIFT) &
2036                 I40E_AQ_VSI_QUEUE_MASK);
2037 
2038         /*
2039          * tc_queues determines the size of the traffic class, where
2040          * the size is 2^^tc_queues to a maximum of 64 for the X710
2041          * and 128 for the X722.
2042          *
2043          * Some examples:
2044          *      i40e_num_trqpairs_per_vsi == 1 =>  tc_queues = 0, 2^^0 = 1.
2045          *      i40e_num_trqpairs_per_vsi == 7 =>  tc_queues = 3, 2^^3 = 8.
2046          *      i40e_num_trqpairs_per_vsi == 8 =>  tc_queues = 3, 2^^3 = 8.
2047          *      i40e_num_trqpairs_per_vsi == 9 =>  tc_queues = 4, 2^^4 = 16.
2048          *      i40e_num_trqpairs_per_vsi == 17 => tc_queues = 5, 2^^5 = 32.
2049          *      i40e_num_trqpairs_per_vsi == 64 => tc_queues = 6, 2^^6 = 64.
2050          */
2051         tc_queues = ddi_fls(i40e->i40e_num_trqpairs_per_vsi - 1);
2052 
2053         /*
2054          * The TC queue mapping is in relation to the VSI queue space.
2055          * Since we are only using one traffic class (TC0) we always
2056          * start at queue offset 0.
2057          */
2058         info->tc_mapping[0] =
2059             LE_16(((0 << I40E_AQ_VSI_TC_QUE_OFFSET_SHIFT) &
2060                     I40E_AQ_VSI_TC_QUE_OFFSET_MASK) |
2061                 ((tc_queues << I40E_AQ_VSI_TC_QUE_NUMBER_SHIFT) &
2062                     I40E_AQ_VSI_TC_QUE_NUMBER_MASK));
2063 
2064         /*
2065          * I40E_AQ_VSI_PVLAN_MODE_ALL ("VLAN driver insertion mode")
2066          *
2067          *      Allow tagged and untagged packets to be sent to this
2068          *      VSI from the host.
2069          *
2070          * I40E_AQ_VSI_PVLAN_EMOD_NOTHING ("VLAN and UP expose mode")
2071          *
2072          *      Leave the tag on the frame and place no VLAN
2073          *      information in the descriptor. We want this mode
2074          *      because our MAC layer will take care of the VLAN tag,
2075          *      if there is one.
2076          */
2077         info->port_vlan_flags = I40E_AQ_VSI_PVLAN_MODE_ALL |
2078             I40E_AQ_VSI_PVLAN_EMOD_NOTHING;
2079 }
2080 
2081 /*
2082  * Delete the VSI at this index, if one exists. We assume there is no
2083  * action we can take if this command fails but to log the failure.
2084  */
2085 static void
2086 i40e_delete_vsi(i40e_t *i40e, uint_t idx)
2087 {
2088         i40e_hw_t       *hw = &i40e->i40e_hw_space;
2089         uint16_t        seid = i40e->i40e_vsis[idx].iv_seid;
2090 
2091         if (seid != 0) {
2092                 int rc;

2093 
2094                 rc = i40e_aq_delete_element(hw, seid, NULL);
2095 
2096                 if (rc != I40E_SUCCESS) {
2097                         i40e_error(i40e, "Failed to delete VSI %d: %d",
2098                             rc, hw->aq.asq_last_status);
2099                 }
2100 
2101                 i40e->i40e_vsis[idx].iv_seid = 0;
2102         }
2103 }
2104 
2105 /*
2106  * Add a new VSI.
2107  */
2108 static boolean_t
2109 i40e_add_vsi(i40e_t *i40e, i40e_hw_t *hw, uint_t idx)
2110 {
2111         struct i40e_vsi_context ctx;
2112         i40e_rx_group_t         *rxg;
2113         int                     rc;
2114 
2115         /*
2116          * The default VSI is created by the controller. This function
2117          * creates new, non-defualt VSIs only.
2118          */
2119         ASSERT3U(idx, !=, 0);
2120 
2121         bzero(&ctx, sizeof (struct i40e_vsi_context));
2122         ctx.uplink_seid = i40e->i40e_veb_seid;
2123         ctx.pf_num = hw->pf_id;
2124         ctx.flags = I40E_AQ_VSI_TYPE_PF;
2125         ctx.connection_type = I40E_AQ_VSI_CONN_TYPE_NORMAL;
2126         i40e_set_shared_vsi_props(i40e, &ctx.info, idx);
2127 
2128         rc = i40e_aq_add_vsi(hw, &ctx, NULL);
2129         if (rc != I40E_SUCCESS) {
2130                 i40e_error(i40e, "i40e_aq_add_vsi() failed %d: %d", rc,
2131                     hw->aq.asq_last_status);
2132                 return (B_FALSE);
2133         }
2134 
2135         rxg = &i40e->i40e_rx_groups[idx];
2136         rxg->irg_vsi_seid = ctx.seid;
2137         i40e->i40e_vsis[idx].iv_number = ctx.vsi_number;
2138         i40e->i40e_vsis[idx].iv_seid = ctx.seid;
2139         i40e->i40e_vsis[idx].iv_stats_id = LE_16(ctx.info.stat_counter_idx);
2140 
2141         if (i40e_stat_vsi_init(i40e, idx) == B_FALSE)
2142                 return (B_FALSE);
2143 
2144         return (B_TRUE);
2145 }
2146 
2147 /*
2148  * Configure the hardware for the Default Virtual Station Interface (VSI).


2149  */
2150 static boolean_t
2151 i40e_config_def_vsi(i40e_t *i40e, i40e_hw_t *hw)
2152 {
2153         struct i40e_vsi_context ctx;
2154         i40e_rx_group_t *def_rxg;
2155         int err;
2156         struct i40e_aqc_remove_macvlan_element_data filt;
2157 
2158         bzero(&ctx, sizeof (struct i40e_vsi_context));
2159         ctx.seid = I40E_DEF_VSI_SEID(i40e);
2160         ctx.pf_num = hw->pf_id;
2161         err = i40e_aq_get_vsi_params(hw, &ctx, NULL);
2162         if (err != I40E_SUCCESS) {
2163                 i40e_error(i40e, "get VSI params failed with %d", err);
2164                 return (B_FALSE);
2165         }
2166 
2167         ctx.info.valid_sections = 0;
2168         i40e->i40e_vsis[0].iv_number = ctx.vsi_number;
2169         i40e->i40e_vsis[0].iv_stats_id = LE_16(ctx.info.stat_counter_idx);
2170         if (i40e_stat_vsi_init(i40e, 0) == B_FALSE)
2171                 return (B_FALSE);
2172 
2173         i40e_set_shared_vsi_props(i40e, &ctx.info, I40E_DEF_VSI_IDX);
2174 
2175         err = i40e_aq_update_vsi_params(hw, &ctx, NULL);
2176         if (err != I40E_SUCCESS) {
2177                 i40e_error(i40e, "Update VSI params failed with %d", err);
2178                 return (B_FALSE);
2179         }
2180 
2181         def_rxg = &i40e->i40e_rx_groups[0];
2182         def_rxg->irg_vsi_seid = I40E_DEF_VSI_SEID(i40e);
2183 
2184         /*
2185          * We have seen three different behaviors in regards to the
2186          * Default VSI and its implicit L2 MAC+VLAN filter.
2187          *
2188          * 1. It has an implicit filter for the factory MAC address
2189          *    and this filter counts against 'ifr_nmacfilt_used'.
2190          *
2191          * 2. It has an implicit filter for the factory MAC address
2192          *    and this filter DOES NOT count against 'ifr_nmacfilt_used'.
2193          *
2194          * 3. It DOES NOT have an implicit filter.
2195          *
2196          * All three of these cases are accounted for below. If we
2197          * fail to remove the L2 filter (ENOENT) then we assume there
2198          * wasn't one. Otherwise, if we successfully remove the
2199          * filter, we make sure to update the 'ifr_nmacfilt_used'
2200          * count accordingly.
2201          *
2202          * We remove this filter to prevent duplicate delivery of
2203          * packets destined for the primary MAC address as DLS will
2204          * create the same filter on a non-default VSI for the primary
2205          * MAC client.
2206          *
2207          * If you change the following code please test it across as
2208          * many X700 series controllers and firmware revisions as you
2209          * can.
2210          */
2211         bzero(&filt, sizeof (filt));
2212         bcopy(hw->mac.port_addr, filt.mac_addr, ETHERADDRL);
2213         filt.flags = I40E_AQC_MACVLAN_DEL_PERFECT_MATCH;
2214         filt.vlan_tag = 0;
2215 
2216         ASSERT3U(i40e->i40e_resources.ifr_nmacfilt_used, <=, 1);
2217         i40e_log(i40e, "Num L2 filters: %u",
2218             i40e->i40e_resources.ifr_nmacfilt_used);
2219 
2220         err = i40e_aq_remove_macvlan(hw, I40E_DEF_VSI_SEID(i40e), &filt, 1,
2221             NULL);
2222         if (err == I40E_SUCCESS) {
2223                 i40e_log(i40e,
2224                     "Removed L2 filter from Default VSI with SEID %u",
2225                     I40E_DEF_VSI_SEID(i40e));
2226         } else if (hw->aq.asq_last_status == ENOENT) {
2227                 i40e_log(i40e,
2228                     "No L2 filter for Default VSI with SEID %u",
2229                     I40E_DEF_VSI_SEID(i40e));
2230         } else {
2231                 i40e_error(i40e, "Failed to remove L2 filter from"
2232                     " Default VSI with SEID %u: %d (%d)",
2233                     I40E_DEF_VSI_SEID(i40e), err, hw->aq.asq_last_status);
2234 
2235                 return (B_FALSE);
2236         }
2237 
2238         /*
2239          *  As mentioned above, the controller created an implicit L2
2240          *  filter for the primary MAC. We want to remove both the
2241          *  filter and decrement the filter count. However, not all
2242          *  controllers count this implicit filter against the total
2243          *  MAC filter count. So here we are making sure it is either
2244          *  one or zero. If it is one, then we know it is for the
2245          *  implicit filter and we should decrement since we just
2246          *  removed the filter above. If it is zero then we know the
2247          *  controller that does not count the implicit filter, and it
2248          *  was enough to just remove it; we leave the count alone.
2249          *  But if it is neither, then we have never seen a controller
2250          *  like this before and we should fail to attach.
2251          *
2252          *  It is unfortunate that this code must exist but the
2253          *  behavior of this implicit L2 filter and its corresponding
2254          *  count were dicovered through empirical testing. The
2255          *  programming manuals hint at this filter but do not
2256          *  explicitly call out the exact behavior.
2257          */
2258         if (i40e->i40e_resources.ifr_nmacfilt_used == 1) {
2259                 i40e->i40e_resources.ifr_nmacfilt_used--;
2260         } else {
2261                 if (i40e->i40e_resources.ifr_nmacfilt_used != 0) {
2262                         i40e_error(i40e, "Unexpected L2 filter count: %u"
2263                             " (expected 0)",
2264                             i40e->i40e_resources.ifr_nmacfilt_used);
2265                             return (B_FALSE);
2266                 }
2267         }
2268 
2269         return (B_TRUE);
2270 }
2271 
2272 static boolean_t
2273 i40e_config_rss_key_x722(i40e_t *i40e, i40e_hw_t *hw)
2274 {
2275         for (uint_t i = 0; i < i40e->i40e_num_rx_groups; i++) {
2276                 uint32_t seed[I40E_PFQF_HKEY_MAX_INDEX + 1];
2277                 struct i40e_aqc_get_set_rss_key_data key;
2278                 const char *u8seed;
2279                 enum i40e_status_code status;
2280                 uint16_t vsi_number = i40e->i40e_vsis[i].iv_number;
2281 
2282                 (void) random_get_pseudo_bytes((uint8_t *)seed, sizeof (seed));
2283                 u8seed = (char *)seed;
2284 
2285                 CTASSERT(sizeof (key) >= (sizeof (key.standard_rss_key) +
2286                     sizeof (key.extended_hash_key)));
2287 
2288                 bcopy(u8seed, key.standard_rss_key,
2289                     sizeof (key.standard_rss_key));
2290                 bcopy(&u8seed[sizeof (key.standard_rss_key)],
2291                     key.extended_hash_key, sizeof (key.extended_hash_key));
2292 
2293                 ASSERT3U(vsi_number, !=, 0);
2294                 status = i40e_aq_set_rss_key(hw, vsi_number, &key);
2295 
2296                 if (status != I40E_SUCCESS) {
2297                         i40e_error(i40e, "failed to set RSS key for VSI %u: %d",
2298                             vsi_number, status);
2299                         return (B_FALSE);
2300                 }
2301         }
2302 
2303         return (B_TRUE);
2304 }
2305 
2306 /*
2307  * Configure the RSS key. For the X710 controller family, this is set on a
2308  * per-PF basis via registers. For the X722, this is done on a per-VSI basis
2309  * through the admin queue.
2310  */
2311 static boolean_t
2312 i40e_config_rss_key(i40e_t *i40e, i40e_hw_t *hw)
2313 {
2314         if (i40e_is_x722(i40e)) {
2315                 if (!i40e_config_rss_key_x722(i40e, hw))
2316                         return (B_FALSE);
2317         } else {
2318                 uint32_t seed[I40E_PFQF_HKEY_MAX_INDEX + 1];
2319 
2320                 (void) random_get_pseudo_bytes((uint8_t *)seed, sizeof (seed));
2321                 for (uint_t i = 0; i <= I40E_PFQF_HKEY_MAX_INDEX; i++)
2322                         i40e_write_rx_ctl(hw, I40E_PFQF_HKEY(i), seed[i]);
2323         }
2324 
2325         return (B_TRUE);
2326 }
2327 
2328 /*
2329  * Populate the LUT. The size of each entry in the LUT depends on the controller
2330  * family, with the X722 using a known 7-bit width. On the X710 controller, this
2331  * is programmed through its control registers where as on the X722 this is
2332  * configured through the admin queue. Also of note, the X722 allows the LUT to
2333  * be set on a per-PF or VSI basis. At this time we use the PF setting. If we
2334  * decide to use the per-VSI LUT in the future, then we will need to modify the
2335  * i40e_add_vsi() function to set the RSS LUT bits in the queueing section.
2336  *
2337  * We populate the LUT in a round robin fashion with the rx queue indices from 0
2338  * to i40e_num_trqpairs_per_vsi - 1.
2339  */
2340 static boolean_t
2341 i40e_config_rss_hlut(i40e_t *i40e, i40e_hw_t *hw)
2342 {
2343         uint32_t *hlut;
2344         uint8_t lut_mask;
2345         uint_t i;
2346         boolean_t ret = B_FALSE;
2347 
2348         /*
2349          * We always configure the PF with a table size of 512 bytes in
2350          * i40e_chip_start().
2351          */
2352         hlut = kmem_alloc(I40E_HLUT_TABLE_SIZE, KM_NOSLEEP);
2353         if (hlut == NULL) {
2354                 i40e_error(i40e, "i40e_config_rss() buffer allocation failed");
2355                 return (B_FALSE);
2356         }
2357 
2358         /*
2359          * The width of the X722 is apparently defined to be 7 bits, regardless
2360          * of the capability.
2361          */
2362         if (i40e_is_x722(i40e)) {
2363                 lut_mask = (1 << 7) - 1;
2364         } else {
2365                 lut_mask = (1 << hw->func_caps.rss_table_entry_width) - 1;
2366         }
2367 
2368         for (i = 0; i < I40E_HLUT_TABLE_SIZE; i++) {
2369                 ((uint8_t *)hlut)[i] =
2370                     (i % i40e->i40e_num_trqpairs_per_vsi) & lut_mask;
2371         }
2372 
2373         if (i40e_is_x722(i40e)) {
2374                 enum i40e_status_code status;
2375 
2376                 status = i40e_aq_set_rss_lut(hw, 0, B_TRUE, (uint8_t *)hlut,
2377                     I40E_HLUT_TABLE_SIZE);
2378 
2379                 if (status != I40E_SUCCESS) {
2380                         i40e_error(i40e, "failed to set RSS LUT %d: %d",
2381                             status, hw->aq.asq_last_status);
2382                         goto out;
2383                 }
2384         } else {
2385                 for (i = 0; i < I40E_HLUT_TABLE_SIZE >> 2; i++) {
2386                         I40E_WRITE_REG(hw, I40E_PFQF_HLUT(i), hlut[i]);
2387                 }
2388         }
2389         ret = B_TRUE;
2390 out:
2391         kmem_free(hlut, I40E_HLUT_TABLE_SIZE);
2392         return (ret);
2393 }
2394 
2395 /*
2396  * Set up RSS.
2397  *      1. Seed the hash key.
2398  *      2. Enable PCTYPEs for the hash filter.
2399  *      3. Populate the LUT.
2400  */
2401 static boolean_t


2437         }
2438 
2439         i40e_write_rx_ctl(hw, I40E_PFQF_HENA(0), (uint32_t)hena);
2440         i40e_write_rx_ctl(hw, I40E_PFQF_HENA(1), (uint32_t)(hena >> 32));
2441 
2442         /*
2443          * 3. Populate LUT
2444          */
2445         return (i40e_config_rss_hlut(i40e, hw));
2446 }
2447 
2448 /*
2449  * Wrapper to kick the chipset on.
2450  */
2451 static boolean_t
2452 i40e_chip_start(i40e_t *i40e)
2453 {
2454         i40e_hw_t *hw = &i40e->i40e_hw_space;
2455         struct i40e_filter_control_settings filter;
2456         int rc;
2457         uint8_t err;
2458 
2459         if (((hw->aq.fw_maj_ver == 4) && (hw->aq.fw_min_ver < 33)) ||
2460             (hw->aq.fw_maj_ver < 4)) {
2461                 i40e_msec_delay(75);
2462                 if (i40e_aq_set_link_restart_an(hw, TRUE, NULL) !=
2463                     I40E_SUCCESS) {
2464                         i40e_error(i40e, "failed to restart link: admin queue "
2465                             "error: %d", hw->aq.asq_last_status);
2466                         return (B_FALSE);
2467                 }
2468         }
2469 
2470         /* Determine hardware state */
2471         i40e_get_hw_state(i40e, hw);
2472 
2473         /* For now, we always disable Ethernet Flow Control. */
2474         hw->fc.requested_mode = I40E_FC_NONE;
2475         rc = i40e_set_fc(hw, &err, B_TRUE);
2476         if (rc != I40E_SUCCESS) {
2477                 i40e_error(i40e, "Setting flow control failed, returned %d"
2478                     " with error: 0x%x", rc, err);
2479                 return (B_FALSE);
2480         }
2481 
2482         /* Initialize mac addresses. */
2483         i40e_init_macaddrs(i40e, hw);
2484 
2485         /*
2486          * Set up the filter control. If the hash lut size is changed from
2487          * I40E_HASH_LUT_SIZE_512 then I40E_HLUT_TABLE_SIZE and
2488          * i40e_config_rss_hlut() will need to be updated.
2489          */
2490         bzero(&filter, sizeof (filter));
2491         filter.enable_ethtype = TRUE;
2492         filter.enable_macvlan = TRUE;
2493         filter.hash_lut_size = I40E_HASH_LUT_SIZE_512;
2494 
2495         rc = i40e_set_filter_control(hw, &filter);
2496         if (rc != I40E_SUCCESS) {
2497                 i40e_error(i40e, "i40e_set_filter_control() returned %d", rc);
2498                 return (B_FALSE);
2499         }
2500 
2501         i40e_intr_chip_init(i40e);
2502 
2503         rc = i40e_get_mac_seid(i40e);
2504         if (rc == -1) {
2505                 i40e_error(i40e, "failed to obtain MAC Uplink SEID");
2506                 return (B_FALSE);
2507         }
2508         i40e->i40e_mac_seid = (uint16_t)rc;
2509 
2510         /*
2511          * Create a VEB in order to support multiple VSIs. Each VSI
2512          * functions as a MAC group. This call sets the PF's MAC as
2513          * the uplink port and the PF's default VSI as the default
2514          * downlink port.
2515          */
2516         rc = i40e_aq_add_veb(hw, i40e->i40e_mac_seid, I40E_DEF_VSI_SEID(i40e),
2517             0x1, B_TRUE, &i40e->i40e_veb_seid, B_FALSE, NULL);
2518         if (rc != I40E_SUCCESS) {
2519                 i40e_error(i40e, "i40e_aq_add_veb() failed %d: %d", rc,
2520                     hw->aq.asq_last_status);
2521                 return (B_FALSE);
2522         }
2523 
2524         if (!i40e_config_def_vsi(i40e, hw))
2525                 return (B_FALSE);
2526 
2527         for (uint_t i = 1; i < i40e->i40e_num_rx_groups; i++) {
2528                 if (!i40e_add_vsi(i40e, hw, i))
2529                         return (B_FALSE);
2530         }
2531 
2532         if (!i40e_config_rss(i40e, hw))
2533                 return (B_FALSE);
2534 
2535         i40e_flush(hw);
2536 
2537         return (B_TRUE);
2538 }
2539 
2540 /*
2541  * Take care of tearing down the rx ring. See 8.3.3.1.2 for more information.
2542  */
2543 static void
2544 i40e_shutdown_rx_rings(i40e_t *i40e)
2545 {
2546         int i;
2547         uint32_t reg;
2548 
2549         i40e_hw_t *hw = &i40e->i40e_hw_space;
2550 
2551         /*


2870         tctx.tphrpacket_ena = I40E_HMC_TX_TPH_DISABLE;
2871         tctx.tphwdesc_ena = I40E_HMC_TX_TPH_DISABLE;
2872         tctx.head_wb_addr = itrq->itrq_desc_area.dmab_dma_address +
2873             sizeof (i40e_tx_desc_t) * itrq->itrq_tx_ring_size;
2874 
2875         /*
2876          * This field isn't actually documented, like crc, but it suggests that
2877          * it should be zeroed. We leave both of these here because of that for
2878          * now. We should check with Intel on why these are here even.
2879          */
2880         tctx.crc = 0;
2881         tctx.rdylist_act = 0;
2882 
2883         /*
2884          * We're supposed to assign the rdylist field with the value of the
2885          * traffic class index for the first device. We query the VSI parameters
2886          * again to get what the handle is. Note that every queue is always
2887          * assigned to traffic class zero, because we don't actually use them.
2888          */
2889         bzero(&context, sizeof (struct i40e_vsi_context));
2890         context.seid = I40E_DEF_VSI_SEID(i40e);
2891         context.pf_num = hw->pf_id;
2892         err = i40e_aq_get_vsi_params(hw, &context, NULL);
2893         if (err != I40E_SUCCESS) {
2894                 i40e_error(i40e, "get VSI params failed with %d", err);
2895                 return (B_FALSE);
2896         }
2897         tctx.rdylist = LE_16(context.info.qs_handle[0]);
2898 
2899         err = i40e_clear_lan_tx_queue_context(hw, itrq->itrq_index);
2900         if (err != I40E_SUCCESS) {
2901                 i40e_error(i40e, "failed to clear tx queue %d context: %d",
2902                     itrq->itrq_index, err);
2903                 return (B_FALSE);
2904         }
2905 
2906         err = i40e_set_lan_tx_queue_context(hw, itrq->itrq_index, &tctx);
2907         if (err != I40E_SUCCESS) {
2908                 i40e_error(i40e, "failed to set tx queue %d context: %d",
2909                     itrq->itrq_index, err);
2910                 return (B_FALSE);


2974                         reg = I40E_READ_REG(hw, I40E_QTX_ENA(i));
2975 
2976                         if (reg & I40E_QTX_ENA_QENA_STAT_MASK)
2977                                 break;
2978                         i40e_msec_delay(I40E_RING_WAIT_PAUSE);
2979                 }
2980 
2981                 if ((reg & I40E_QTX_ENA_QENA_STAT_MASK) == 0) {
2982                         i40e_error(i40e, "failed to enable tx queue %d, timed "
2983                             "out", i);
2984                         return (B_FALSE);
2985                 }
2986         }
2987 
2988         return (B_TRUE);
2989 }
2990 
2991 void
2992 i40e_stop(i40e_t *i40e, boolean_t free_allocations)
2993 {
2994         uint_t i;
2995         i40e_hw_t *hw = &i40e->i40e_hw_space;
2996 
2997         ASSERT(MUTEX_HELD(&i40e->i40e_general_lock));
2998 
2999         /*
3000          * Shutdown and drain the tx and rx pipeline. We do this using the
3001          * following steps.
3002          *
3003          * 1) Shutdown interrupts to all the queues (trying to keep the admin
3004          *    queue alive).
3005          *
3006          * 2) Remove all of the interrupt tx and rx causes by setting the
3007          *    interrupt linked lists to zero.
3008          *
3009          * 2) Shutdown the tx and rx rings. Because i40e_shutdown_rings() should
3010          *    wait for all the queues to be disabled, once we reach that point
3011          *    it should be safe to free associated data.
3012          *
3013          * 4) Wait 50ms after all that is done. This ensures that the rings are
3014          *    ready for programming again and we don't have to think about this
3015          *    in other parts of the driver.
3016          *
3017          * 5) Disable remaining chip interrupts, (admin queue, etc.)
3018          *
3019          * 6) Verify that FM is happy with all the register accesses we
3020          *    performed.
3021          */
3022         i40e_intr_io_disable_all(i40e);
3023         i40e_intr_io_clear_cause(i40e);
3024 
3025         if (i40e_shutdown_rings(i40e) == B_FALSE) {
3026                 ddi_fm_service_impact(i40e->i40e_dip, DDI_SERVICE_LOST);
3027         }
3028 
3029         delay(50 * drv_usectohz(1000));
3030 
3031         /*
3032          * We don't delete the default VSI because it replaces the VEB
3033          * after VEB deletion (see the "Delete Element" section).
3034          * Furthermore, since the default VSI is provided by the
3035          * firmware, we never attempt to delete it.
3036          */
3037         for (i = 1; i < i40e->i40e_num_rx_groups; i++) {
3038                 i40e_delete_vsi(i40e, i);
3039         }
3040 
3041         if (i40e->i40e_veb_seid != 0) {
3042                 int rc = i40e_aq_delete_element(hw, i40e->i40e_veb_seid, NULL);
3043 
3044                 if (rc != I40E_SUCCESS) {
3045                         i40e_error(i40e, "Failed to delete VEB %d: %d", rc,
3046                             hw->aq.asq_last_status);
3047                 }
3048 
3049                 i40e->i40e_veb_seid = 0;
3050         }
3051 
3052         i40e_intr_chip_fini(i40e);
3053 
3054         for (i = 0; i < i40e->i40e_num_trqpairs; i++) {
3055                 mutex_enter(&i40e->i40e_trqpairs[i].itrq_rx_lock);
3056                 mutex_enter(&i40e->i40e_trqpairs[i].itrq_tx_lock);
3057         }
3058 
3059         /*
3060          * We should consider refactoring this to be part of the ring start /
3061          * stop routines at some point.
3062          */
3063         for (i = 0; i < i40e->i40e_num_trqpairs; i++) {
3064                 i40e_stats_trqpair_fini(&i40e->i40e_trqpairs[i]);
3065         }
3066 
3067         if (i40e_check_acc_handle(i40e->i40e_osdep_space.ios_cfg_handle) !=
3068             DDI_FM_OK) {
3069                 ddi_fm_service_impact(i40e->i40e_dip, DDI_SERVICE_LOST);
3070         }
3071 
3072         for (i = 0; i < i40e->i40e_num_trqpairs; i++) {
3073                 i40e_tx_cleanup_ring(&i40e->i40e_trqpairs[i]);
3074         }
3075 
3076         for (i = 0; i < i40e->i40e_num_trqpairs; i++) {
3077                 mutex_exit(&i40e->i40e_trqpairs[i].itrq_rx_lock);
3078                 mutex_exit(&i40e->i40e_trqpairs[i].itrq_tx_lock);
3079         }
3080 
3081         for (i = 0; i < i40e->i40e_num_rx_groups; i++) {
3082                 i40e_stat_vsi_fini(i40e, i);
3083         }
3084 
3085         i40e->i40e_link_speed = 0;
3086         i40e->i40e_link_duplex = 0;
3087         i40e_link_state_set(i40e, LINK_STATE_UNKNOWN);
3088 
3089         if (free_allocations) {
3090                 i40e_free_ring_mem(i40e, B_FALSE);
3091         }
3092 }
3093 
3094 boolean_t
3095 i40e_start(i40e_t *i40e, boolean_t alloc)
3096 {
3097         i40e_hw_t *hw = &i40e->i40e_hw_space;
3098         boolean_t rc = B_TRUE;
3099         int i, err;
3100 
3101         ASSERT(MUTEX_HELD(&i40e->i40e_general_lock));
3102 
3103         if (alloc) {


3128         if (!i40e_chip_start(i40e)) {
3129                 i40e_fm_ereport(i40e, DDI_FM_DEVICE_INVAL_STATE);
3130                 rc = B_FALSE;
3131                 goto done;
3132         }
3133 
3134         if (i40e_setup_rx_rings(i40e) == B_FALSE) {
3135                 rc = B_FALSE;
3136                 goto done;
3137         }
3138 
3139         if (i40e_setup_tx_rings(i40e) == B_FALSE) {
3140                 rc = B_FALSE;
3141                 goto done;
3142         }
3143 
3144         /*
3145          * Enable broadcast traffic; however, do not enable multicast traffic.
3146          * That's handle exclusively through MAC's mc_multicst routines.
3147          */
3148         err = i40e_aq_set_vsi_broadcast(hw, I40E_DEF_VSI_SEID(i40e), B_TRUE,
3149             NULL);
3150         if (err != I40E_SUCCESS) {
3151                 i40e_error(i40e, "failed to set default VSI: %d", err);
3152                 rc = B_FALSE;
3153                 goto done;
3154         }
3155 
3156         err = i40e_aq_set_mac_config(hw, i40e->i40e_frame_max, B_TRUE, 0, NULL);
3157         if (err != I40E_SUCCESS) {
3158                 i40e_error(i40e, "failed to set MAC config: %d", err);
3159                 rc = B_FALSE;
3160                 goto done;
3161         }
3162 
3163         /*
3164          * Finally, make sure that we're happy from an FM perspective.
3165          */
3166         if (i40e_check_acc_handle(i40e->i40e_osdep_space.ios_reg_handle) !=
3167             DDI_FM_OK) {
3168                 rc = B_FALSE;
3169                 goto done;