Print this page
NEX-20178 Heavy read load using 10G i40e causes network disconnect
MFV illumos-joyent@83a8d0d616db36010b59cc850d1926c0f6a30de1
OS-7457 i40e Tx freezes on zero descriptors
Reviewed by: Robert Mustacchi <rm@joyent.com>
Reviewed by: Rob Johnston <rob.johnston@joyent.com>
Approved by: Robert Mustacchi <rm@joyent.com>
MFV illumos-joyent@0d3f2b61dcfb18edace4fd257054f6fdbe07c99c
OS-7492 i40e Tx freeze when b_cont chain exceeds 8 descriptors
Reviewed by: Robert Mustacchi <rm@joyent.com>
Reviewed by: Rob Johnston <rob.johnston@joyent.com>
Approved by: Robert Mustacchi <rm@joyent.com>
MFV illumos-joyent@b4bede175d4c50ac1b36078a677b69388f6fb59f
OS-7577 initialize FC for i40e
Reviewed by: Robert Mustacchi <rm@joyent.com>
Approved by: Rob Johnston <rob.johnston@joyent.com>
MFV illumos-joyent@83a8d0d616db36010b59cc850d1926c0f6a30de1
OS-7457 i40e Tx freezes on zero descriptors
Reviewed by: Robert Mustacchi <rm@joyent.com>
Reviewed by: Rob Johnston <rob.johnston@joyent.com>
Approved by: Robert Mustacchi <rm@joyent.com>
NEX-19928 i40e: cannot create static IP address
Reviewed by: Cynthia Eastham <cynthia.eastham@nexenta.com>
MFV: illumos-joyent@b93056a35d6d6d301f24bc6631fc57dd2c8992c4
OS-7456 i40e default VSI sometimes lacks implicit L2 filter
Reviewed by: Robert Mustacchi <rm@joyent.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Ryan Zezeski <rpz@joyent.com>
MFV: illumos-joyent@61dc3dec4f82a3e13e94609a0a83d5f66c64e760
OS-6846 want i40e multi-group support
OS-7372 i40e_alloc_ring_mem() unwinds when it shouldn't
Reviewed by: Robert Mustacchi <rm@joyent.com>
Approved by: Robert Mustacchi <rm@joyent.com>
Author: Ryan Zezeski <rpz@joyent.com>
MFV: illumos-joyent@9e30beee2f0c127bf41868db46257124206e28d6
OS-5225 Want Fortville TSO support
Reviewed by: Ryan Zezeski <rpz@joyent.com>
Reviewed by: Robert Mustacchi <rm@joyent.com>
Approved by: Patrick Mooney <patrick.mooney@joyent.com>
Author: Rob Johnston <rob.johnston@joyent.com>
NEX-7822 40Gb Intel XL710 NIC performance data
Reviewed by: Steve Peng <steve.peng@nexenta.com>
Reviewed by: Evan Layton <evan.layton@nexenta.com>

Split Close
Expand all
Collapse all
          --- old/usr/src/uts/common/io/i40e/i40e_main.c
          +++ new/usr/src/uts/common/io/i40e/i40e_main.c
↓ open down ↓ 3 lines elided ↑ open up ↑
   4    4   * You may only use this file in accordance with the terms of version
   5    5   * 1.0 of the CDDL.
   6    6   *
   7    7   * A full copy of the text of the CDDL should have accompanied this
   8    8   * source.  A copy of the CDDL is also available via the Internet at
   9    9   * http://www.illumos.org/license/CDDL.
  10   10   */
  11   11  
  12   12  /*
  13   13   * Copyright 2015 OmniTI Computer Consulting, Inc. All rights reserved.
  14      - * Copyright (c) 2017, Joyent, Inc.
       14 + * Copyright 2019 Joyent, Inc.
  15   15   * Copyright 2017 Tegile Systems, Inc.  All rights reserved.
  16   16   */
  17   17  
  18   18  /*
  19   19   * i40e - Intel 10/40 Gb Ethernet driver
  20   20   *
  21   21   * The i40e driver is the main software device driver for the Intel 40 Gb family
  22   22   * of devices. Note that these devices come in many flavors with both 40 GbE
  23   23   * ports and 10 GbE ports. This device is the successor to the 82599 family of
  24   24   * devices (ixgbe).
↓ open down ↓ 105 lines elided ↑ open up ↑
 130  130   * NVRAM, there is nothing that says that two devices can't be mis-programmed to
 131  131   * have the same values in NVRAM. Instead, we uniquely identify a group of
 132  132   * functions based on their parent in the /devices tree, their PCI bus and PCI
 133  133   * function identifiers. Using either on their own may not be sufficient.
 134  134   *
 135  135   * For each unique PCI device that we encounter, we'll create a i40e_device_t.
 136  136   * From there, because we don't have a good way to tell the GLDv3 about sharing
 137  137   * resources between everything, we'll end up just dividing the resources
 138  138   * evenly between all of the functions. Longer term, if we don't have to declare
 139  139   * to the GLDv3 that these resources are shared, then we'll maintain a pool and
 140      - * hae each PF allocate from the pool in the device, thus if only two of four
      140 + * have each PF allocate from the pool in the device, thus if only two of four
 141  141   * ports are being used, for example, then all of the resources can still be
 142  142   * used.
 143  143   *
 144  144   * -------------------------------------------
 145  145   * Transmit and Receive Queue Pair Allocations
 146  146   * -------------------------------------------
 147  147   *
 148  148   * NVRAM ends up assigning each PF its own share of the transmit and receive LAN
 149  149   * queue pairs, we have no way of modifying it, only observing it. From there,
 150  150   * it's up to us to map these queues to VSIs and VFs. Since we don't support any
↓ open down ↓ 11 lines elided ↑ open up ↑
 162  162   *
 163  163   * As part of the GLDv3, we need to make sure that we can handle receiving
 164  164   * broadcast and multicast traffic. As well as enabling promiscuous mode when
 165  165   * requested. GLDv3 requires that all broadcast and multicast traffic be
 166  166   * retrieved by the default group, eg. the first one. This is the same thing as
 167  167   * the default VSI.
 168  168   *
 169  169   * To receieve broadcast traffic, we enable it through the admin queue, rather
 170  170   * than use one of our filters for it. For multicast traffic, we reserve a
 171  171   * certain number of the hash filters and assign them to a given PF. When we
 172      - * exceed those, we then switch to using promicuous mode for multicast traffic.
      172 + * exceed those, we then switch to using promiscuous mode for multicast traffic.
 173  173   *
 174  174   * More specifically, once we exceed the number of filters (indicated because
 175  175   * the i40e_t`i40e_resources.ifr_nmcastfilt ==
 176  176   * i40e_t`i40e_resources.ifr_nmcastfilt_used), we then instead need to toggle
 177  177   * promiscuous mode. If promiscuous mode is toggled then we keep track of the
 178  178   * number of MACs added to it by incrementing i40e_t`i40e_mcast_promisc_count.
 179  179   * That will stay enabled until that count reaches zero indicating that we have
 180  180   * only added multicast addresses that we have a corresponding entry for.
 181  181   *
 182  182   * Because MAC itself wants to toggle promiscuous mode, which includes both
 183  183   * unicast and multicast traffic, we go through and keep track of that
 184  184   * ourselves. That is maintained through the use of the i40e_t`i40e_promisc_on
 185  185   * member.
 186  186   *
 187  187   * --------------
 188  188   * VSI Management
 189  189   * --------------
 190  190   *
 191      - * At this time, we currently only support a single MAC group, and thus a single
 192      - * VSI. This VSI is considered the default VSI and should be the only one that
 193      - * exists after a reset. Currently it is stored as the member
 194      - * i40e_t`i40e_vsi_id. While this works for the moment and for an initial
 195      - * driver, it's not sufficient for the longer-term path of the driver. Instead,
 196      - * we'll want to actually have a unique i40e_vsi_t structure which is used
 197      - * everywhere. Note that this means that every place that uses the
 198      - * i40e_t`i40e_vsi_id will need to be refactored.
      191 + * The PFs share 384 VSIs. The firmware creates one VSI per PF by default.
      192 + * During chip start we retrieve the SEID of this VSI and assign it as the
      193 + * default VSI for our VEB (one VEB per PF). We then add additional VSIs to
      194 + * the VEB up to the determined number of rx groups: i40e_t`i40e_num_rx_groups.
      195 + * We currently cap this number to I40E_GROUP_MAX to a) make sure all PFs can
      196 + * allocate the same number of VSIs, and b) to keep the interrupt multiplexing
      197 + * under control. In the future, when we improve the interrupt allocation, we
      198 + * may want to revisit this cap to make better use of the available VSIs. The
      199 + * VSI allocation and configuration can be found in i40e_chip_start().
 199  200   *
 200  201   * ----------------
 201  202   * Structure Layout
 202  203   * ----------------
 203  204   *
 204  205   * The following images relates the core data structures together. The primary
 205  206   * structure in the system is the i40e_t. It itself contains multiple rings,
 206  207   * i40e_trqpair_t's which contain the various transmit and receive data. The
 207  208   * receive data is stored outside of the i40e_trqpair_t and instead in the
 208  209   * i40e_rx_data_t. The i40e_t has a corresponding i40e_device_t which keeps
↓ open down ↓ 24 lines elided ↑ open up ↑
 233  234   *       |  +---------------------------+     |   +-------------------+
 234  235   *       +->| GLDv3 Device, per PF      |-----|-->| GLDv3 Device (PF) |--> ...
 235  236   *          | i40e_t                    |     |   | i40e_t            |
 236  237   *          | **Primary Structure**     |     |   +-------------------+
 237  238   *          |                           |     |
 238  239   *          | i40e_device_t *         --+-----+
 239  240   *          | i40e_state_t            --+---> Device State
 240  241   *          | i40e_hw_t               --+---> Intel common code structure
 241  242   *          | mac_handle_t            --+---> GLDv3 handle to MAC
 242  243   *          | ddi_periodic_t          --+---> Link activity timer
 243      - *          | int (vsi_id)            --+---> VSI ID, main identifier
      244 + *          | i40e_vsi_t *            --+---> Array of VSIs
 244  245   *          | i40e_func_rsrc_t        --+---> Available hardware resources
 245  246   *          | i40e_switch_rsrc_t *    --+---> Switch resource snapshot
 246  247   *          | i40e_sdu                --+---> Current MTU
 247  248   *          | i40e_frame_max          --+---> Current HW frame size
 248  249   *          | i40e_uaddr_t *          --+---> Array of assigned unicast MACs
 249  250   *          | i40e_maddr_t *          --+---> Array of assigned multicast MACs
 250  251   *          | i40e_mcast_promisccount --+---> Active multicast state
 251  252   *          | i40e_promisc_on         --+---> Current promiscuous mode state
 252      - *          | int                     --+---> Number of transmit/receive pairs
      253 + *          | uint_t                  --+---> Number of transmit/receive pairs
      254 + *          | i40e_rx_group_t *       --+---> Array of Rx groups
 253  255   *          | kstat_t *               --+---> PF kstats
 254      - *          | kstat_t *               --+---> VSI kstats
 255  256   *          | i40e_pf_stats_t         --+---> PF kstat backing data
 256      - *          | i40e_vsi_stats_t        --+---> VSI kstat backing data
 257  257   *          | i40e_trqpair_t *        --+---------+
 258  258   *          +---------------------------+         |
 259  259   *                                                |
 260  260   *                                                v
 261  261   *  +-------------------------------+       +-----------------------------+
 262  262   *  | Transmit/Receive Queue Pair   |-------| Transmit/Receive Queue Pair |->...
 263  263   *  | i40e_trqpair_t                |       | i40e_trqpair_t              |
 264  264   *  + Ring Data Structure           |       +-----------------------------+
 265  265   *  |                               |
 266  266   *  | mac_ring_handle_t             +--> MAC RX ring handle
↓ open down ↓ 85 lines elided ↑ open up ↑
 352  352   *
 353  353   * -----------
 354  354   * Future Work
 355  355   * -----------
 356  356   *
 357  357   * At the moment the i40e_t driver is rather bare bones, allowing us to start
 358  358   * getting data flowing and folks using it while we develop additional features.
 359  359   * While bugs have been filed to cover this future work, the following gives an
 360  360   * overview of expected work:
 361  361   *
 362      - *  o TSO support
 363      - *  o Multiple group support
 364  362   *  o DMA binding and breaking up the locking in ring recycling.
 365  363   *  o Enhanced detection of device errors
 366  364   *  o Participation in IRM
 367  365   *  o FMA device reset
 368  366   *  o Stall detection, temperature error detection, etc.
 369  367   *  o More dynamic resource pools
 370  368   */
 371  369  
 372  370  #include "i40e_sw.h"
 373  371  
 374      -static char i40e_ident[] = "Intel 10/40Gb Ethernet v1.0.1";
      372 +static char i40e_ident[] = "Intel 10/40Gb Ethernet v1.0.3";
 375  373  
 376  374  /*
 377  375   * The i40e_glock primarily protects the lists below and the i40e_device_t
 378  376   * structures.
 379  377   */
 380  378  static kmutex_t i40e_glock;
 381  379  static list_t i40e_glist;
 382  380  static list_t i40e_dlist;
 383  381  
 384  382  /*
↓ open down ↓ 369 lines elided ↑ open up ↑
 754  752  
 755  753          (void) snprintf(buf, FM_MAX_CLASS, "%s.%s", DDI_FM_DEVICE, detail);
 756  754          ena = fm_ena_generate(0, FM_ENA_FMT1);
 757  755          if (DDI_FM_EREPORT_CAP(i40e->i40e_fm_capabilities)) {
 758  756                  ddi_fm_ereport_post(i40e->i40e_dip, buf, ena, DDI_NOSLEEP,
 759  757                      FM_VERSION, DATA_TYPE_UINT8, FM_EREPORT_VERS0, NULL);
 760  758          }
 761  759  }
 762  760  
 763  761  /*
 764      - * Here we're trying to get the ID of the default VSI. In general, when we come
 765      - * through and look at this shortly after attach, we expect there to only be a
 766      - * single element present, which is the default VSI. Importantly, each PF seems
 767      - * to not see any other devices, in part because of the simple switch mode that
 768      - * we're using. If for some reason, we see more artifact, we'll need to revisit
 769      - * what we're doing here.
      762 + * Here we're trying to set the SEID of the default VSI. In general,
      763 + * when we come through and look at this shortly after attach, we
      764 + * expect there to only be a single element present, which is the
      765 + * default VSI. Importantly, each PF seems to not see any other
      766 + * devices, in part because of the simple switch mode that we're
      767 + * using. If for some reason, we see more artifacts, we'll need to
      768 + * revisit what we're doing here.
 770  769   */
 771      -static int
 772      -i40e_get_vsi_id(i40e_t *i40e)
      770 +static boolean_t
      771 +i40e_set_def_vsi_seid(i40e_t *i40e)
 773  772  {
 774  773          i40e_hw_t *hw = &i40e->i40e_hw_space;
 775  774          struct i40e_aqc_get_switch_config_resp *sw_config;
 776  775          uint8_t aq_buf[I40E_AQ_LARGE_BUF];
 777  776          uint16_t next = 0;
 778  777          int rc;
 779  778  
 780  779          /* LINTED: E_BAD_PTR_CAST_ALIGN */
 781  780          sw_config = (struct i40e_aqc_get_switch_config_resp *)aq_buf;
 782  781          rc = i40e_aq_get_switch_config(hw, sw_config, sizeof (aq_buf), &next,
 783  782              NULL);
 784  783          if (rc != I40E_SUCCESS) {
 785  784                  i40e_error(i40e, "i40e_aq_get_switch_config() failed %d: %d",
 786  785                      rc, hw->aq.asq_last_status);
 787      -                return (-1);
      786 +                return (B_FALSE);
 788  787          }
 789  788  
 790  789          if (LE_16(sw_config->header.num_reported) != 1) {
 791  790                  i40e_error(i40e, "encountered multiple (%d) switching units "
 792  791                      "during attach, not proceeding",
 793  792                      LE_16(sw_config->header.num_reported));
      793 +                return (B_FALSE);
      794 +        }
      795 +
      796 +        I40E_DEF_VSI_SEID(i40e) = sw_config->element[0].seid;
      797 +        return (B_TRUE);
      798 +}
      799 +
      800 +/*
      801 + * Get the SEID of the uplink MAC.
      802 + */
      803 +static int
      804 +i40e_get_mac_seid(i40e_t *i40e)
      805 +{
      806 +        i40e_hw_t *hw = &i40e->i40e_hw_space;
      807 +        struct i40e_aqc_get_switch_config_resp *sw_config;
      808 +        uint8_t aq_buf[I40E_AQ_LARGE_BUF];
      809 +        uint16_t next = 0;
      810 +        int rc;
      811 +
      812 +        /* LINTED: E_BAD_PTR_CAST_ALIGN */
      813 +        sw_config = (struct i40e_aqc_get_switch_config_resp *)aq_buf;
      814 +        rc = i40e_aq_get_switch_config(hw, sw_config, sizeof (aq_buf), &next,
      815 +            NULL);
      816 +        if (rc != I40E_SUCCESS) {
      817 +                i40e_error(i40e, "i40e_aq_get_switch_config() failed %d: %d",
      818 +                    rc, hw->aq.asq_last_status);
 794  819                  return (-1);
 795  820          }
 796  821  
 797      -        return (sw_config->element[0].seid);
      822 +        return (LE_16(sw_config->element[0].uplink_seid));
 798  823  }
 799  824  
 800  825  /*
 801  826   * We need to fill the i40e_hw_t structure with the capabilities of this PF. We
 802  827   * must also provide the memory for it; however, we don't need to keep it around
 803  828   * to the call to the common code. It takes it and parses it into an internal
 804  829   * structure.
 805  830   */
 806  831  static boolean_t
 807  832  i40e_get_hw_capabilities(i40e_t *i40e, i40e_hw_t *hw)
↓ open down ↓ 283 lines elided ↑ open up ↑
1091 1116  
1092 1117          return (B_TRUE);
1093 1118  }
1094 1119  
1095 1120  /*
1096 1121   * Free receive & transmit rings.
1097 1122   */
1098 1123  static void
1099 1124  i40e_free_trqpairs(i40e_t *i40e)
1100 1125  {
1101      -        int i;
1102 1126          i40e_trqpair_t *itrq;
1103 1127  
     1128 +        if (i40e->i40e_rx_groups != NULL) {
     1129 +                kmem_free(i40e->i40e_rx_groups,
     1130 +                    sizeof (i40e_rx_group_t) * i40e->i40e_num_rx_groups);
     1131 +                i40e->i40e_rx_groups = NULL;
     1132 +        }
     1133 +
1104 1134          if (i40e->i40e_trqpairs != NULL) {
1105      -                for (i = 0; i < i40e->i40e_num_trqpairs; i++) {
     1135 +                for (uint_t i = 0; i < i40e->i40e_num_trqpairs; i++) {
1106 1136                          itrq = &i40e->i40e_trqpairs[i];
1107 1137                          mutex_destroy(&itrq->itrq_rx_lock);
1108 1138                          mutex_destroy(&itrq->itrq_tx_lock);
1109 1139                          mutex_destroy(&itrq->itrq_tcb_lock);
1110 1140  
1111 1141                          /*
1112 1142                           * Should have already been cleaned up by start/stop,
1113 1143                           * etc.
1114 1144                           */
1115 1145                          ASSERT(itrq->itrq_txkstat == NULL);
↓ open down ↓ 10 lines elided ↑ open up ↑
1126 1156          mutex_destroy(&i40e->i40e_general_lock);
1127 1157  }
1128 1158  
1129 1159  /*
1130 1160   * Allocate transmit and receive rings, as well as other data structures that we
1131 1161   * need.
1132 1162   */
1133 1163  static boolean_t
1134 1164  i40e_alloc_trqpairs(i40e_t *i40e)
1135 1165  {
1136      -        int i;
1137 1166          void *mutexpri = DDI_INTR_PRI(i40e->i40e_intr_pri);
1138 1167  
1139 1168          /*
1140 1169           * Now that we have the priority for the interrupts, initialize
1141 1170           * all relevant locks.
1142 1171           */
1143 1172          mutex_init(&i40e->i40e_general_lock, NULL, MUTEX_DRIVER, mutexpri);
1144 1173          mutex_init(&i40e->i40e_rx_pending_lock, NULL, MUTEX_DRIVER, mutexpri);
1145 1174          cv_init(&i40e->i40e_rx_pending_cv, NULL, CV_DRIVER, NULL);
1146 1175  
1147 1176          i40e->i40e_trqpairs = kmem_zalloc(sizeof (i40e_trqpair_t) *
1148 1177              i40e->i40e_num_trqpairs, KM_SLEEP);
1149      -        for (i = 0; i < i40e->i40e_num_trqpairs; i++) {
     1178 +        for (uint_t i = 0; i < i40e->i40e_num_trqpairs; i++) {
1150 1179                  i40e_trqpair_t *itrq = &i40e->i40e_trqpairs[i];
1151 1180  
1152 1181                  itrq->itrq_i40e = i40e;
1153 1182                  mutex_init(&itrq->itrq_rx_lock, NULL, MUTEX_DRIVER, mutexpri);
1154 1183                  mutex_init(&itrq->itrq_tx_lock, NULL, MUTEX_DRIVER, mutexpri);
1155 1184                  mutex_init(&itrq->itrq_tcb_lock, NULL, MUTEX_DRIVER, mutexpri);
1156 1185                  itrq->itrq_index = i;
1157 1186          }
1158 1187  
     1188 +        i40e->i40e_rx_groups = kmem_zalloc(sizeof (i40e_rx_group_t) *
     1189 +            i40e->i40e_num_rx_groups, KM_SLEEP);
     1190 +
     1191 +        for (uint_t i = 0; i < i40e->i40e_num_rx_groups; i++) {
     1192 +                i40e_rx_group_t *rxg = &i40e->i40e_rx_groups[i];
     1193 +
     1194 +                rxg->irg_index = i;
     1195 +                rxg->irg_i40e = i40e;
     1196 +        }
     1197 +
1159 1198          return (B_TRUE);
1160 1199  }
1161 1200  
1162 1201  
1163 1202  
1164 1203  /*
1165 1204   * Unless a .conf file already overrode i40e_t structure values, they will
1166 1205   * be 0, and need to be set in conjunction with the now-available HW report.
1167      - *
1168      - * However, at the moment, we cap all of these resources as we only support a
1169      - * single receive ring and a single group.
1170 1206   */
1171 1207  /* ARGSUSED */
1172 1208  static void
1173 1209  i40e_hw_to_instance(i40e_t *i40e, i40e_hw_t *hw)
1174 1210  {
1175      -        if (i40e->i40e_num_trqpairs == 0) {
1176      -                i40e->i40e_num_trqpairs = I40E_TRQPAIR_MAX;
     1211 +        if (i40e->i40e_num_trqpairs_per_vsi == 0) {
     1212 +                if (i40e_is_x722(i40e)) {
     1213 +                        i40e->i40e_num_trqpairs_per_vsi =
     1214 +                            I40E_722_MAX_TC_QUEUES;
     1215 +                } else {
     1216 +                        i40e->i40e_num_trqpairs_per_vsi =
     1217 +                            I40E_710_MAX_TC_QUEUES;
     1218 +                }
1177 1219          }
1178 1220  
1179 1221          if (i40e->i40e_num_rx_groups == 0) {
1180 1222                  i40e->i40e_num_rx_groups = I40E_GROUP_MAX;
1181 1223          }
1182 1224  }
1183 1225  
1184 1226  /*
1185 1227   * Free any resources required by, or setup by, the Intel common code.
1186 1228   */
↓ open down ↓ 115 lines elided ↑ open up ↑
1302 1344          }
1303 1345          bcopy(hw->mac.addr, hw->mac.perm_addr, ETHERADDRL);
1304 1346          if ((rc = i40e_get_port_mac_addr(hw, hw->mac.port_addr)) !=
1305 1347              I40E_SUCCESS) {
1306 1348                  i40e_error(i40e, "failed to retrieve port mac address: %d",
1307 1349                      rc);
1308 1350                  return (B_FALSE);
1309 1351          }
1310 1352  
1311 1353          /*
1312      -         * We need to obtain the Virtual Station ID (VSI) before we can
1313      -         * perform other operations on the device.
     1354 +         * We need to obtain the Default Virtual Station SEID (VSI)
     1355 +         * before we can perform other operations on the device.
1314 1356           */
1315      -        i40e->i40e_vsi_id = i40e_get_vsi_id(i40e);
1316      -        if (i40e->i40e_vsi_id == -1) {
1317      -                i40e_error(i40e, "failed to obtain VSI ID");
     1357 +        if (!i40e_set_def_vsi_seid(i40e)) {
     1358 +                i40e_error(i40e, "failed to obtain Default VSI SEID");
1318 1359                  return (B_FALSE);
1319 1360          }
1320 1361  
1321 1362          return (B_TRUE);
1322 1363  }
1323 1364  
1324 1365  static void
1325 1366  i40e_unconfigure(dev_info_t *devinfo, i40e_t *i40e)
1326 1367  {
1327 1368          int rc;
↓ open down ↓ 224 lines elided ↑ open up ↑
1552 1593                      I40E_DESC_ALIGN);
1553 1594          }
1554 1595  
1555 1596          i40e->i40e_rx_limit_per_intr = i40e_get_prop(i40e, "rx_limit_per_intr",
1556 1597              I40E_MIN_RX_LIMIT_PER_INTR, I40E_MAX_RX_LIMIT_PER_INTR,
1557 1598              I40E_DEF_RX_LIMIT_PER_INTR);
1558 1599  
1559 1600          i40e->i40e_tx_hcksum_enable = i40e_get_prop(i40e, "tx_hcksum_enable",
1560 1601              B_FALSE, B_TRUE, B_TRUE);
1561 1602  
     1603 +        i40e->i40e_tx_lso_enable = i40e_get_prop(i40e, "tx_lso_enable",
     1604 +            B_FALSE, B_TRUE, B_TRUE);
     1605 +
1562 1606          i40e->i40e_rx_hcksum_enable = i40e_get_prop(i40e, "rx_hcksum_enable",
1563 1607              B_FALSE, B_TRUE, B_TRUE);
1564 1608  
1565 1609          i40e->i40e_rx_dma_min = i40e_get_prop(i40e, "rx_dma_threshold",
1566 1610              I40E_MIN_RX_DMA_THRESH, I40E_MAX_RX_DMA_THRESH,
1567 1611              I40E_DEF_RX_DMA_THRESH);
1568 1612  
1569 1613          i40e->i40e_tx_dma_min = i40e_get_prop(i40e, "tx_dma_threshold",
1570 1614              I40E_MIN_TX_DMA_THRESH, I40E_MAX_TX_DMA_THRESH,
1571 1615              I40E_DEF_TX_DMA_THRESH);
↓ open down ↓ 149 lines elided ↑ open up ↑
1721 1765          }
1722 1766  
1723 1767          rc = ddi_intr_get_supported_types(devinfo, &intr_types);
1724 1768          if (rc != DDI_SUCCESS) {
1725 1769                  i40e_error(i40e, "failed to get supported interrupt types: %d",
1726 1770                      rc);
1727 1771                  return (B_FALSE);
1728 1772          }
1729 1773  
1730 1774          i40e->i40e_intr_type = 0;
     1775 +        i40e->i40e_num_rx_groups = I40E_GROUP_MAX;
1731 1776  
     1777 +        /*
     1778 +         * We need to determine the number of queue pairs per traffic
     1779 +         * class. We only have one traffic class (TC0), so we'll base
     1780 +         * this off the number of interrupts provided. Furthermore,
     1781 +         * since we only use one traffic class, the number of queues
     1782 +         * per traffic class and per VSI are the same.
     1783 +         */
1732 1784          if ((intr_types & DDI_INTR_TYPE_MSIX) &&
1733      -            i40e->i40e_intr_force <= I40E_INTR_MSIX) {
1734      -                if (i40e_alloc_intr_handles(i40e, devinfo,
1735      -                    DDI_INTR_TYPE_MSIX)) {
1736      -                        i40e->i40e_num_trqpairs =
1737      -                            MIN(i40e->i40e_intr_count - 1, max_trqpairs);
1738      -                        return (B_TRUE);
1739      -                }
     1785 +            (i40e->i40e_intr_force <= I40E_INTR_MSIX) &&
     1786 +            (i40e_alloc_intr_handles(i40e, devinfo, DDI_INTR_TYPE_MSIX))) {
     1787 +                uint32_t n;
     1788 +
     1789 +                /*
     1790 +                 * While we want the number of queue pairs to match
     1791 +                 * the number of interrupts, we must keep stay in
     1792 +                 * bounds of the maximum number of queues per traffic
     1793 +                 * class. We subtract one from i40e_intr_count to
     1794 +                 * account for interrupt zero; which is currently
     1795 +                 * restricted to admin queue commands and other
     1796 +                 * interrupt causes.
     1797 +                 */
     1798 +                n = MIN(i40e->i40e_intr_count - 1, max_trqpairs);
     1799 +                ASSERT3U(n, >, 0);
     1800 +
     1801 +                /*
     1802 +                 * Round up to the nearest power of two to ensure that
     1803 +                 * the QBASE aligns with the TC size which must be
     1804 +                 * programmed as a power of two. See the queue mapping
     1805 +                 * description in section 7.4.9.5.5.1.
     1806 +                 *
     1807 +                 * If i40e_intr_count - 1 is not a power of two then
     1808 +                 * some queue pairs on the same VSI will have to share
     1809 +                 * an interrupt.
     1810 +                 *
     1811 +                 * We may want to revisit this logic in a future where
     1812 +                 * we have more interrupts and more VSIs. Otherwise,
     1813 +                 * each VSI will use as many interrupts as possible.
     1814 +                 * Using more QPs per VSI means better RSS for each
     1815 +                 * group, but at the same time may require more
     1816 +                 * sharing of interrupts across VSIs. This may be a
     1817 +                 * good candidate for a .conf tunable.
     1818 +                 */
     1819 +                n = 0x1 << ddi_fls(n);
     1820 +                i40e->i40e_num_trqpairs_per_vsi = n;
     1821 +                ASSERT3U(i40e->i40e_num_rx_groups, >, 0);
     1822 +                i40e->i40e_num_trqpairs = i40e->i40e_num_trqpairs_per_vsi *
     1823 +                    i40e->i40e_num_rx_groups;
     1824 +                return (B_TRUE);
1740 1825          }
1741 1826  
1742 1827          /*
1743 1828           * We only use multiple transmit/receive pairs when MSI-X interrupts are
1744 1829           * available due to the fact that the device basically only supports a
1745 1830           * single MSI interrupt.
1746 1831           */
1747 1832          i40e->i40e_num_trqpairs = I40E_TRQPAIR_NOMSIX;
     1833 +        i40e->i40e_num_trqpairs_per_vsi = i40e->i40e_num_trqpairs;
1748 1834          i40e->i40e_num_rx_groups = I40E_GROUP_NOMSIX;
1749 1835  
1750 1836          if ((intr_types & DDI_INTR_TYPE_MSI) &&
1751 1837              (i40e->i40e_intr_force <= I40E_INTR_MSI)) {
1752 1838                  if (i40e_alloc_intr_handles(i40e, devinfo, DDI_INTR_TYPE_MSI))
1753 1839                          return (B_TRUE);
1754 1840          }
1755 1841  
1756 1842          if (intr_types & DDI_INTR_TYPE_FIXED) {
1757 1843                  if (i40e_alloc_intr_handles(i40e, devinfo, DDI_INTR_TYPE_FIXED))
↓ open down ↓ 2 lines elided ↑ open up ↑
1760 1846  
1761 1847          return (B_FALSE);
1762 1848  }
1763 1849  
1764 1850  /*
1765 1851   * Map different interrupts to MSI-X vectors.
1766 1852   */
1767 1853  static boolean_t
1768 1854  i40e_map_intrs_to_vectors(i40e_t *i40e)
1769 1855  {
1770      -        int i;
1771      -
1772 1856          if (i40e->i40e_intr_type != DDI_INTR_TYPE_MSIX) {
1773 1857                  return (B_TRUE);
1774 1858          }
1775 1859  
1776 1860          /*
1777      -         * Each queue pair is mapped to a single interrupt, so transmit
1778      -         * and receive interrupts for a given queue share the same vector.
1779      -         * The number of queue pairs is one less than the number of interrupt
1780      -         * vectors and is assigned the vector one higher than its index.
1781      -         * Vector zero is reserved for the admin queue.
     1861 +         * Each queue pair is mapped to a single interrupt, so
     1862 +         * transmit and receive interrupts for a given queue share the
     1863 +         * same vector. Vector zero is reserved for the admin queue.
1782 1864           */
1783      -        ASSERT(i40e->i40e_intr_count == i40e->i40e_num_trqpairs + 1);
     1865 +        for (uint_t i = 0; i < i40e->i40e_num_trqpairs; i++) {
     1866 +                uint_t vector = i % (i40e->i40e_intr_count - 1);
1784 1867  
1785      -        for (i = 0; i < i40e->i40e_num_trqpairs; i++) {
1786      -                i40e->i40e_trqpairs[i].itrq_rx_intrvec = i + 1;
1787      -                i40e->i40e_trqpairs[i].itrq_tx_intrvec = i + 1;
     1868 +                i40e->i40e_trqpairs[i].itrq_rx_intrvec = vector + 1;
     1869 +                i40e->i40e_trqpairs[i].itrq_tx_intrvec = vector + 1;
1788 1870          }
1789 1871  
1790 1872          return (B_TRUE);
1791 1873  }
1792 1874  
1793 1875  static boolean_t
1794 1876  i40e_add_intr_handlers(i40e_t *i40e)
1795 1877  {
1796 1878          int rc, vector;
1797 1879  
↓ open down ↓ 118 lines elided ↑ open up ↑
1916 1998   * implementing this yet, we're keeping this around for when we add reset
1917 1999   * capabilities, so this isn't forgotten.
1918 2000   */
1919 2001  /* ARGSUSED */
1920 2002  static void
1921 2003  i40e_init_macaddrs(i40e_t *i40e, i40e_hw_t *hw)
1922 2004  {
1923 2005  }
1924 2006  
1925 2007  /*
1926      - * Configure the hardware for the Virtual Station Interface (VSI).  Currently
1927      - * we only support one, but in the future we could instantiate more than one
1928      - * per attach-point.
     2008 + * Set the properties which have common values across all the VSIs.
     2009 + * Consult the "Add VSI" command section (7.4.9.5.5.1) for a
     2010 + * complete description of these properties.
1929 2011   */
1930      -static boolean_t
1931      -i40e_config_vsi(i40e_t *i40e, i40e_hw_t *hw)
     2012 +static void
     2013 +i40e_set_shared_vsi_props(i40e_t *i40e,
     2014 +    struct i40e_aqc_vsi_properties_data *info, uint_t vsi_idx)
1932 2015  {
1933      -        struct i40e_vsi_context context;
1934      -        int err, tc_queues;
     2016 +        uint_t tc_queues;
     2017 +        uint16_t vsi_qp_base;
1935 2018  
1936      -        bzero(&context, sizeof (struct i40e_vsi_context));
1937      -        context.seid = i40e->i40e_vsi_id;
1938      -        context.pf_num = hw->pf_id;
1939      -        err = i40e_aq_get_vsi_params(hw, &context, NULL);
1940      -        if (err != I40E_SUCCESS) {
1941      -                i40e_error(i40e, "get VSI params failed with %d", err);
1942      -                return (B_FALSE);
1943      -        }
     2019 +        /*
     2020 +         * It's important that we use bitwise-OR here; callers to this
     2021 +         * function might enable other sections before calling this
     2022 +         * function.
     2023 +         */
     2024 +        info->valid_sections |= LE_16(I40E_AQ_VSI_PROP_QUEUE_MAP_VALID |
     2025 +            I40E_AQ_VSI_PROP_VLAN_VALID);
1944 2026  
1945      -        i40e->i40e_vsi_num = context.vsi_number;
1946      -
1947 2027          /*
1948      -         * Set the queue and traffic class bits.  Keep it simple for now.
     2028 +         * Calculate the starting QP index for this VSI. This base is
     2029 +         * relative to the PF queue space; so a value of 0 for PF#1
     2030 +         * represents the absolute index PFLAN_QALLOC_FIRSTQ for PF#1.
1949 2031           */
1950      -        context.info.valid_sections = I40E_AQ_VSI_PROP_QUEUE_MAP_VALID;
1951      -        context.info.mapping_flags = I40E_AQ_VSI_QUE_MAP_CONTIG;
1952      -        context.info.queue_mapping[0] = I40E_ASSIGN_ALL_QUEUES;
     2032 +        vsi_qp_base = vsi_idx * i40e->i40e_num_trqpairs_per_vsi;
     2033 +        info->mapping_flags = LE_16(I40E_AQ_VSI_QUE_MAP_CONTIG);
     2034 +        info->queue_mapping[0] =
     2035 +            LE_16((vsi_qp_base << I40E_AQ_VSI_QUEUE_SHIFT) &
     2036 +                I40E_AQ_VSI_QUEUE_MASK);
1953 2037  
1954 2038          /*
1955      -         * tc_queues determines the size of the traffic class, where the size is
1956      -         * 2^^tc_queues to a maximum of 64 for the X710 and 128 for the X722.
     2039 +         * tc_queues determines the size of the traffic class, where
     2040 +         * the size is 2^^tc_queues to a maximum of 64 for the X710
     2041 +         * and 128 for the X722.
1957 2042           *
1958 2043           * Some examples:
1959      -         *      i40e_num_trqpairs == 1 =>  tc_queues = 0, 2^^0 = 1.
1960      -         *      i40e_num_trqpairs == 7 =>  tc_queues = 3, 2^^3 = 8.
1961      -         *      i40e_num_trqpairs == 8 =>  tc_queues = 3, 2^^3 = 8.
1962      -         *      i40e_num_trqpairs == 9 =>  tc_queues = 4, 2^^4 = 16.
1963      -         *      i40e_num_trqpairs == 17 => tc_queues = 5, 2^^5 = 32.
1964      -         *      i40e_num_trqpairs == 64 => tc_queues = 6, 2^^6 = 64.
     2044 +         *      i40e_num_trqpairs_per_vsi == 1 =>  tc_queues = 0, 2^^0 = 1.
     2045 +         *      i40e_num_trqpairs_per_vsi == 7 =>  tc_queues = 3, 2^^3 = 8.
     2046 +         *      i40e_num_trqpairs_per_vsi == 8 =>  tc_queues = 3, 2^^3 = 8.
     2047 +         *      i40e_num_trqpairs_per_vsi == 9 =>  tc_queues = 4, 2^^4 = 16.
     2048 +         *      i40e_num_trqpairs_per_vsi == 17 => tc_queues = 5, 2^^5 = 32.
     2049 +         *      i40e_num_trqpairs_per_vsi == 64 => tc_queues = 6, 2^^6 = 64.
1965 2050           */
1966      -        tc_queues = ddi_fls(i40e->i40e_num_trqpairs - 1);
     2051 +        tc_queues = ddi_fls(i40e->i40e_num_trqpairs_per_vsi - 1);
1967 2052  
1968      -        context.info.tc_mapping[0] = ((0 << I40E_AQ_VSI_TC_QUE_OFFSET_SHIFT) &
1969      -            I40E_AQ_VSI_TC_QUE_OFFSET_MASK) |
1970      -            ((tc_queues << I40E_AQ_VSI_TC_QUE_NUMBER_SHIFT) &
1971      -            I40E_AQ_VSI_TC_QUE_NUMBER_MASK);
     2053 +        /*
     2054 +         * The TC queue mapping is in relation to the VSI queue space.
     2055 +         * Since we are only using one traffic class (TC0) we always
     2056 +         * start at queue offset 0.
     2057 +         */
     2058 +        info->tc_mapping[0] =
     2059 +            LE_16(((0 << I40E_AQ_VSI_TC_QUE_OFFSET_SHIFT) &
     2060 +                    I40E_AQ_VSI_TC_QUE_OFFSET_MASK) |
     2061 +                ((tc_queues << I40E_AQ_VSI_TC_QUE_NUMBER_SHIFT) &
     2062 +                    I40E_AQ_VSI_TC_QUE_NUMBER_MASK));
1972 2063  
1973      -        context.info.valid_sections |= I40E_AQ_VSI_PROP_VLAN_VALID;
1974      -        context.info.port_vlan_flags = I40E_AQ_VSI_PVLAN_MODE_ALL |
     2064 +        /*
     2065 +         * I40E_AQ_VSI_PVLAN_MODE_ALL ("VLAN driver insertion mode")
     2066 +         *
     2067 +         *      Allow tagged and untagged packets to be sent to this
     2068 +         *      VSI from the host.
     2069 +         *
     2070 +         * I40E_AQ_VSI_PVLAN_EMOD_NOTHING ("VLAN and UP expose mode")
     2071 +         *
     2072 +         *      Leave the tag on the frame and place no VLAN
     2073 +         *      information in the descriptor. We want this mode
     2074 +         *      because our MAC layer will take care of the VLAN tag,
     2075 +         *      if there is one.
     2076 +         */
     2077 +        info->port_vlan_flags = I40E_AQ_VSI_PVLAN_MODE_ALL |
1975 2078              I40E_AQ_VSI_PVLAN_EMOD_NOTHING;
     2079 +}
1976 2080  
1977      -        context.flags = LE16_TO_CPU(I40E_AQ_VSI_TYPE_PF);
     2081 +/*
     2082 + * Delete the VSI at this index, if one exists. We assume there is no
     2083 + * action we can take if this command fails but to log the failure.
     2084 + */
     2085 +static void
     2086 +i40e_delete_vsi(i40e_t *i40e, uint_t idx)
     2087 +{
     2088 +        i40e_hw_t       *hw = &i40e->i40e_hw_space;
     2089 +        uint16_t        seid = i40e->i40e_vsis[idx].iv_seid;
1978 2090  
1979      -        i40e->i40e_vsi_stat_id = LE16_TO_CPU(context.info.stat_counter_idx);
1980      -        if (i40e_stat_vsi_init(i40e) == B_FALSE)
1981      -                return (B_FALSE);
     2091 +        if (seid != 0) {
     2092 +                int rc;
1982 2093  
1983      -        err = i40e_aq_update_vsi_params(hw, &context, NULL);
1984      -        if (err != I40E_SUCCESS) {
1985      -                i40e_error(i40e, "Update VSI params failed with %d", err);
     2094 +                rc = i40e_aq_delete_element(hw, seid, NULL);
     2095 +
     2096 +                if (rc != I40E_SUCCESS) {
     2097 +                        i40e_error(i40e, "Failed to delete VSI %d: %d",
     2098 +                            rc, hw->aq.asq_last_status);
     2099 +                }
     2100 +
     2101 +                i40e->i40e_vsis[idx].iv_seid = 0;
     2102 +        }
     2103 +}
     2104 +
     2105 +/*
     2106 + * Add a new VSI.
     2107 + */
     2108 +static boolean_t
     2109 +i40e_add_vsi(i40e_t *i40e, i40e_hw_t *hw, uint_t idx)
     2110 +{
     2111 +        struct i40e_vsi_context ctx;
     2112 +        i40e_rx_group_t         *rxg;
     2113 +        int                     rc;
     2114 +
     2115 +        /*
     2116 +         * The default VSI is created by the controller. This function
     2117 +         * creates new, non-defualt VSIs only.
     2118 +         */
     2119 +        ASSERT3U(idx, !=, 0);
     2120 +
     2121 +        bzero(&ctx, sizeof (struct i40e_vsi_context));
     2122 +        ctx.uplink_seid = i40e->i40e_veb_seid;
     2123 +        ctx.pf_num = hw->pf_id;
     2124 +        ctx.flags = I40E_AQ_VSI_TYPE_PF;
     2125 +        ctx.connection_type = I40E_AQ_VSI_CONN_TYPE_NORMAL;
     2126 +        i40e_set_shared_vsi_props(i40e, &ctx.info, idx);
     2127 +
     2128 +        rc = i40e_aq_add_vsi(hw, &ctx, NULL);
     2129 +        if (rc != I40E_SUCCESS) {
     2130 +                i40e_error(i40e, "i40e_aq_add_vsi() failed %d: %d", rc,
     2131 +                    hw->aq.asq_last_status);
1986 2132                  return (B_FALSE);
1987 2133          }
1988 2134  
     2135 +        rxg = &i40e->i40e_rx_groups[idx];
     2136 +        rxg->irg_vsi_seid = ctx.seid;
     2137 +        i40e->i40e_vsis[idx].iv_number = ctx.vsi_number;
     2138 +        i40e->i40e_vsis[idx].iv_seid = ctx.seid;
     2139 +        i40e->i40e_vsis[idx].iv_stats_id = LE_16(ctx.info.stat_counter_idx);
1989 2140  
     2141 +        if (i40e_stat_vsi_init(i40e, idx) == B_FALSE)
     2142 +                return (B_FALSE);
     2143 +
1990 2144          return (B_TRUE);
1991 2145  }
1992 2146  
1993 2147  /*
1994      - * Configure the RSS key. For the X710 controller family, this is set on a
1995      - * per-PF basis via registers. For the X722, this is done on a per-VSI basis
1996      - * through the admin queue.
     2148 + * Configure the hardware for the Default Virtual Station Interface (VSI).
1997 2149   */
1998 2150  static boolean_t
1999      -i40e_config_rss_key(i40e_t *i40e, i40e_hw_t *hw)
     2151 +i40e_config_def_vsi(i40e_t *i40e, i40e_hw_t *hw)
2000 2152  {
2001      -        uint32_t seed[I40E_PFQF_HKEY_MAX_INDEX + 1];
     2153 +        struct i40e_vsi_context ctx;
     2154 +        i40e_rx_group_t *def_rxg;
     2155 +        int err;
     2156 +        struct i40e_aqc_remove_macvlan_element_data filt;
2002 2157  
2003      -        (void) random_get_pseudo_bytes((uint8_t *)seed, sizeof (seed));
     2158 +        bzero(&ctx, sizeof (struct i40e_vsi_context));
     2159 +        ctx.seid = I40E_DEF_VSI_SEID(i40e);
     2160 +        ctx.pf_num = hw->pf_id;
     2161 +        err = i40e_aq_get_vsi_params(hw, &ctx, NULL);
     2162 +        if (err != I40E_SUCCESS) {
     2163 +                i40e_error(i40e, "get VSI params failed with %d", err);
     2164 +                return (B_FALSE);
     2165 +        }
2004 2166  
2005      -        if (i40e_is_x722(i40e)) {
     2167 +        ctx.info.valid_sections = 0;
     2168 +        i40e->i40e_vsis[0].iv_number = ctx.vsi_number;
     2169 +        i40e->i40e_vsis[0].iv_stats_id = LE_16(ctx.info.stat_counter_idx);
     2170 +        if (i40e_stat_vsi_init(i40e, 0) == B_FALSE)
     2171 +                return (B_FALSE);
     2172 +
     2173 +        i40e_set_shared_vsi_props(i40e, &ctx.info, I40E_DEF_VSI_IDX);
     2174 +
     2175 +        err = i40e_aq_update_vsi_params(hw, &ctx, NULL);
     2176 +        if (err != I40E_SUCCESS) {
     2177 +                i40e_error(i40e, "Update VSI params failed with %d", err);
     2178 +                return (B_FALSE);
     2179 +        }
     2180 +
     2181 +        def_rxg = &i40e->i40e_rx_groups[0];
     2182 +        def_rxg->irg_vsi_seid = I40E_DEF_VSI_SEID(i40e);
     2183 +
     2184 +        /*
     2185 +         * We have seen three different behaviors in regards to the
     2186 +         * Default VSI and its implicit L2 MAC+VLAN filter.
     2187 +         *
     2188 +         * 1. It has an implicit filter for the factory MAC address
     2189 +         *    and this filter counts against 'ifr_nmacfilt_used'.
     2190 +         *
     2191 +         * 2. It has an implicit filter for the factory MAC address
     2192 +         *    and this filter DOES NOT count against 'ifr_nmacfilt_used'.
     2193 +         *
     2194 +         * 3. It DOES NOT have an implicit filter.
     2195 +         *
     2196 +         * All three of these cases are accounted for below. If we
     2197 +         * fail to remove the L2 filter (ENOENT) then we assume there
     2198 +         * wasn't one. Otherwise, if we successfully remove the
     2199 +         * filter, we make sure to update the 'ifr_nmacfilt_used'
     2200 +         * count accordingly.
     2201 +         *
     2202 +         * We remove this filter to prevent duplicate delivery of
     2203 +         * packets destined for the primary MAC address as DLS will
     2204 +         * create the same filter on a non-default VSI for the primary
     2205 +         * MAC client.
     2206 +         *
     2207 +         * If you change the following code please test it across as
     2208 +         * many X700 series controllers and firmware revisions as you
     2209 +         * can.
     2210 +         */
     2211 +        bzero(&filt, sizeof (filt));
     2212 +        bcopy(hw->mac.port_addr, filt.mac_addr, ETHERADDRL);
     2213 +        filt.flags = I40E_AQC_MACVLAN_DEL_PERFECT_MATCH;
     2214 +        filt.vlan_tag = 0;
     2215 +
     2216 +        ASSERT3U(i40e->i40e_resources.ifr_nmacfilt_used, <=, 1);
     2217 +        i40e_log(i40e, "Num L2 filters: %u",
     2218 +            i40e->i40e_resources.ifr_nmacfilt_used);
     2219 +
     2220 +        err = i40e_aq_remove_macvlan(hw, I40E_DEF_VSI_SEID(i40e), &filt, 1,
     2221 +            NULL);
     2222 +        if (err == I40E_SUCCESS) {
     2223 +                i40e_log(i40e,
     2224 +                    "Removed L2 filter from Default VSI with SEID %u",
     2225 +                    I40E_DEF_VSI_SEID(i40e));
     2226 +        } else if (hw->aq.asq_last_status == ENOENT) {
     2227 +                i40e_log(i40e,
     2228 +                    "No L2 filter for Default VSI with SEID %u",
     2229 +                    I40E_DEF_VSI_SEID(i40e));
     2230 +        } else {
     2231 +                i40e_error(i40e, "Failed to remove L2 filter from"
     2232 +                    " Default VSI with SEID %u: %d (%d)",
     2233 +                    I40E_DEF_VSI_SEID(i40e), err, hw->aq.asq_last_status);
     2234 +
     2235 +                return (B_FALSE);
     2236 +        }
     2237 +
     2238 +        /*
     2239 +         *  As mentioned above, the controller created an implicit L2
     2240 +         *  filter for the primary MAC. We want to remove both the
     2241 +         *  filter and decrement the filter count. However, not all
     2242 +         *  controllers count this implicit filter against the total
     2243 +         *  MAC filter count. So here we are making sure it is either
     2244 +         *  one or zero. If it is one, then we know it is for the
     2245 +         *  implicit filter and we should decrement since we just
     2246 +         *  removed the filter above. If it is zero then we know the
     2247 +         *  controller that does not count the implicit filter, and it
     2248 +         *  was enough to just remove it; we leave the count alone.
     2249 +         *  But if it is neither, then we have never seen a controller
     2250 +         *  like this before and we should fail to attach.
     2251 +         *
     2252 +         *  It is unfortunate that this code must exist but the
     2253 +         *  behavior of this implicit L2 filter and its corresponding
     2254 +         *  count were dicovered through empirical testing. The
     2255 +         *  programming manuals hint at this filter but do not
     2256 +         *  explicitly call out the exact behavior.
     2257 +         */
     2258 +        if (i40e->i40e_resources.ifr_nmacfilt_used == 1) {
     2259 +                i40e->i40e_resources.ifr_nmacfilt_used--;
     2260 +        } else {
     2261 +                if (i40e->i40e_resources.ifr_nmacfilt_used != 0) {
     2262 +                        i40e_error(i40e, "Unexpected L2 filter count: %u"
     2263 +                            " (expected 0)",
     2264 +                            i40e->i40e_resources.ifr_nmacfilt_used);
     2265 +                            return (B_FALSE);
     2266 +                }
     2267 +        }
     2268 +
     2269 +        return (B_TRUE);
     2270 +}
     2271 +
     2272 +static boolean_t
     2273 +i40e_config_rss_key_x722(i40e_t *i40e, i40e_hw_t *hw)
     2274 +{
     2275 +        for (uint_t i = 0; i < i40e->i40e_num_rx_groups; i++) {
     2276 +                uint32_t seed[I40E_PFQF_HKEY_MAX_INDEX + 1];
2006 2277                  struct i40e_aqc_get_set_rss_key_data key;
2007      -                const char *u8seed = (char *)seed;
     2278 +                const char *u8seed;
2008 2279                  enum i40e_status_code status;
     2280 +                uint16_t vsi_number = i40e->i40e_vsis[i].iv_number;
2009 2281  
     2282 +                (void) random_get_pseudo_bytes((uint8_t *)seed, sizeof (seed));
     2283 +                u8seed = (char *)seed;
     2284 +
2010 2285                  CTASSERT(sizeof (key) >= (sizeof (key.standard_rss_key) +
2011 2286                      sizeof (key.extended_hash_key)));
2012 2287  
2013 2288                  bcopy(u8seed, key.standard_rss_key,
2014 2289                      sizeof (key.standard_rss_key));
2015 2290                  bcopy(&u8seed[sizeof (key.standard_rss_key)],
2016 2291                      key.extended_hash_key, sizeof (key.extended_hash_key));
2017 2292  
2018      -                status = i40e_aq_set_rss_key(hw, i40e->i40e_vsi_num, &key);
     2293 +                ASSERT3U(vsi_number, !=, 0);
     2294 +                status = i40e_aq_set_rss_key(hw, vsi_number, &key);
     2295 +
2019 2296                  if (status != I40E_SUCCESS) {
2020      -                        i40e_error(i40e, "failed to set rss key: %d", status);
     2297 +                        i40e_error(i40e, "failed to set RSS key for VSI %u: %d",
     2298 +                            vsi_number, status);
2021 2299                          return (B_FALSE);
2022 2300                  }
     2301 +        }
     2302 +
     2303 +        return (B_TRUE);
     2304 +}
     2305 +
     2306 +/*
     2307 + * Configure the RSS key. For the X710 controller family, this is set on a
     2308 + * per-PF basis via registers. For the X722, this is done on a per-VSI basis
     2309 + * through the admin queue.
     2310 + */
     2311 +static boolean_t
     2312 +i40e_config_rss_key(i40e_t *i40e, i40e_hw_t *hw)
     2313 +{
     2314 +        if (i40e_is_x722(i40e)) {
     2315 +                if (!i40e_config_rss_key_x722(i40e, hw))
     2316 +                        return (B_FALSE);
2023 2317          } else {
2024      -                uint_t i;
2025      -                for (i = 0; i <= I40E_PFQF_HKEY_MAX_INDEX; i++)
     2318 +                uint32_t seed[I40E_PFQF_HKEY_MAX_INDEX + 1];
     2319 +
     2320 +                (void) random_get_pseudo_bytes((uint8_t *)seed, sizeof (seed));
     2321 +                for (uint_t i = 0; i <= I40E_PFQF_HKEY_MAX_INDEX; i++)
2026 2322                          i40e_write_rx_ctl(hw, I40E_PFQF_HKEY(i), seed[i]);
2027 2323          }
2028 2324  
2029 2325          return (B_TRUE);
2030 2326  }
2031 2327  
2032 2328  /*
2033 2329   * Populate the LUT. The size of each entry in the LUT depends on the controller
2034 2330   * family, with the X722 using a known 7-bit width. On the X710 controller, this
2035 2331   * is programmed through its control registers where as on the X722 this is
2036 2332   * configured through the admin queue. Also of note, the X722 allows the LUT to
2037      - * be set on a per-PF or VSI basis. At this time, as we only have a single VSI,
2038      - * we use the PF setting as it is the primary VSI.
     2333 + * be set on a per-PF or VSI basis. At this time we use the PF setting. If we
     2334 + * decide to use the per-VSI LUT in the future, then we will need to modify the
     2335 + * i40e_add_vsi() function to set the RSS LUT bits in the queueing section.
2039 2336   *
2040 2337   * We populate the LUT in a round robin fashion with the rx queue indices from 0
2041      - * to i40e_num_trqpairs - 1.
     2338 + * to i40e_num_trqpairs_per_vsi - 1.
2042 2339   */
2043 2340  static boolean_t
2044 2341  i40e_config_rss_hlut(i40e_t *i40e, i40e_hw_t *hw)
2045 2342  {
2046 2343          uint32_t *hlut;
2047 2344          uint8_t lut_mask;
2048 2345          uint_t i;
2049 2346          boolean_t ret = B_FALSE;
2050 2347  
2051 2348          /*
↓ open down ↓ 9 lines elided ↑ open up ↑
2061 2358          /*
2062 2359           * The width of the X722 is apparently defined to be 7 bits, regardless
2063 2360           * of the capability.
2064 2361           */
2065 2362          if (i40e_is_x722(i40e)) {
2066 2363                  lut_mask = (1 << 7) - 1;
2067 2364          } else {
2068 2365                  lut_mask = (1 << hw->func_caps.rss_table_entry_width) - 1;
2069 2366          }
2070 2367  
2071      -        for (i = 0; i < I40E_HLUT_TABLE_SIZE; i++)
2072      -                ((uint8_t *)hlut)[i] = (i % i40e->i40e_num_trqpairs) & lut_mask;
     2368 +        for (i = 0; i < I40E_HLUT_TABLE_SIZE; i++) {
     2369 +                ((uint8_t *)hlut)[i] =
     2370 +                    (i % i40e->i40e_num_trqpairs_per_vsi) & lut_mask;
     2371 +        }
2073 2372  
2074 2373          if (i40e_is_x722(i40e)) {
2075 2374                  enum i40e_status_code status;
2076      -                status = i40e_aq_set_rss_lut(hw, i40e->i40e_vsi_num, B_TRUE,
2077      -                    (uint8_t *)hlut, I40E_HLUT_TABLE_SIZE);
     2375 +
     2376 +                status = i40e_aq_set_rss_lut(hw, 0, B_TRUE, (uint8_t *)hlut,
     2377 +                    I40E_HLUT_TABLE_SIZE);
     2378 +
2078 2379                  if (status != I40E_SUCCESS) {
2079      -                        i40e_error(i40e, "failed to set RSS LUT: %d", status);
     2380 +                        i40e_error(i40e, "failed to set RSS LUT %d: %d",
     2381 +                            status, hw->aq.asq_last_status);
2080 2382                          goto out;
2081 2383                  }
2082 2384          } else {
2083 2385                  for (i = 0; i < I40E_HLUT_TABLE_SIZE >> 2; i++) {
2084 2386                          I40E_WRITE_REG(hw, I40E_PFQF_HLUT(i), hlut[i]);
2085 2387                  }
2086 2388          }
2087 2389          ret = B_TRUE;
2088 2390  out:
2089 2391          kmem_free(hlut, I40E_HLUT_TABLE_SIZE);
↓ open down ↓ 55 lines elided ↑ open up ↑
2145 2447  
2146 2448  /*
2147 2449   * Wrapper to kick the chipset on.
2148 2450   */
2149 2451  static boolean_t
2150 2452  i40e_chip_start(i40e_t *i40e)
2151 2453  {
2152 2454          i40e_hw_t *hw = &i40e->i40e_hw_space;
2153 2455          struct i40e_filter_control_settings filter;
2154 2456          int rc;
     2457 +        uint8_t err;
2155 2458  
2156 2459          if (((hw->aq.fw_maj_ver == 4) && (hw->aq.fw_min_ver < 33)) ||
2157 2460              (hw->aq.fw_maj_ver < 4)) {
2158 2461                  i40e_msec_delay(75);
2159 2462                  if (i40e_aq_set_link_restart_an(hw, TRUE, NULL) !=
2160 2463                      I40E_SUCCESS) {
2161 2464                          i40e_error(i40e, "failed to restart link: admin queue "
2162 2465                              "error: %d", hw->aq.asq_last_status);
2163 2466                          return (B_FALSE);
2164 2467                  }
2165 2468          }
2166 2469  
2167 2470          /* Determine hardware state */
2168 2471          i40e_get_hw_state(i40e, hw);
2169 2472  
     2473 +        /* For now, we always disable Ethernet Flow Control. */
     2474 +        hw->fc.requested_mode = I40E_FC_NONE;
     2475 +        rc = i40e_set_fc(hw, &err, B_TRUE);
     2476 +        if (rc != I40E_SUCCESS) {
     2477 +                i40e_error(i40e, "Setting flow control failed, returned %d"
     2478 +                    " with error: 0x%x", rc, err);
     2479 +                return (B_FALSE);
     2480 +        }
     2481 +
2170 2482          /* Initialize mac addresses. */
2171 2483          i40e_init_macaddrs(i40e, hw);
2172 2484  
2173 2485          /*
2174 2486           * Set up the filter control. If the hash lut size is changed from
2175 2487           * I40E_HASH_LUT_SIZE_512 then I40E_HLUT_TABLE_SIZE and
2176 2488           * i40e_config_rss_hlut() will need to be updated.
2177 2489           */
2178 2490          bzero(&filter, sizeof (filter));
2179 2491          filter.enable_ethtype = TRUE;
↓ open down ↓ 1 lines elided ↑ open up ↑
2181 2493          filter.hash_lut_size = I40E_HASH_LUT_SIZE_512;
2182 2494  
2183 2495          rc = i40e_set_filter_control(hw, &filter);
2184 2496          if (rc != I40E_SUCCESS) {
2185 2497                  i40e_error(i40e, "i40e_set_filter_control() returned %d", rc);
2186 2498                  return (B_FALSE);
2187 2499          }
2188 2500  
2189 2501          i40e_intr_chip_init(i40e);
2190 2502  
2191      -        if (!i40e_config_vsi(i40e, hw))
     2503 +        rc = i40e_get_mac_seid(i40e);
     2504 +        if (rc == -1) {
     2505 +                i40e_error(i40e, "failed to obtain MAC Uplink SEID");
2192 2506                  return (B_FALSE);
     2507 +        }
     2508 +        i40e->i40e_mac_seid = (uint16_t)rc;
2193 2509  
     2510 +        /*
     2511 +         * Create a VEB in order to support multiple VSIs. Each VSI
     2512 +         * functions as a MAC group. This call sets the PF's MAC as
     2513 +         * the uplink port and the PF's default VSI as the default
     2514 +         * downlink port.
     2515 +         */
     2516 +        rc = i40e_aq_add_veb(hw, i40e->i40e_mac_seid, I40E_DEF_VSI_SEID(i40e),
     2517 +            0x1, B_TRUE, &i40e->i40e_veb_seid, B_FALSE, NULL);
     2518 +        if (rc != I40E_SUCCESS) {
     2519 +                i40e_error(i40e, "i40e_aq_add_veb() failed %d: %d", rc,
     2520 +                    hw->aq.asq_last_status);
     2521 +                return (B_FALSE);
     2522 +        }
     2523 +
     2524 +        if (!i40e_config_def_vsi(i40e, hw))
     2525 +                return (B_FALSE);
     2526 +
     2527 +        for (uint_t i = 1; i < i40e->i40e_num_rx_groups; i++) {
     2528 +                if (!i40e_add_vsi(i40e, hw, i))
     2529 +                        return (B_FALSE);
     2530 +        }
     2531 +
2194 2532          if (!i40e_config_rss(i40e, hw))
2195 2533                  return (B_FALSE);
2196 2534  
2197 2535          i40e_flush(hw);
2198 2536  
2199 2537          return (B_TRUE);
2200 2538  }
2201 2539  
2202 2540  /*
2203 2541   * Take care of tearing down the rx ring. See 8.3.3.1.2 for more information.
↓ open down ↓ 338 lines elided ↑ open up ↑
2542 2880          tctx.crc = 0;
2543 2881          tctx.rdylist_act = 0;
2544 2882  
2545 2883          /*
2546 2884           * We're supposed to assign the rdylist field with the value of the
2547 2885           * traffic class index for the first device. We query the VSI parameters
2548 2886           * again to get what the handle is. Note that every queue is always
2549 2887           * assigned to traffic class zero, because we don't actually use them.
2550 2888           */
2551 2889          bzero(&context, sizeof (struct i40e_vsi_context));
2552      -        context.seid = i40e->i40e_vsi_id;
     2890 +        context.seid = I40E_DEF_VSI_SEID(i40e);
2553 2891          context.pf_num = hw->pf_id;
2554 2892          err = i40e_aq_get_vsi_params(hw, &context, NULL);
2555 2893          if (err != I40E_SUCCESS) {
2556 2894                  i40e_error(i40e, "get VSI params failed with %d", err);
2557 2895                  return (B_FALSE);
2558 2896          }
2559 2897          tctx.rdylist = LE_16(context.info.qs_handle[0]);
2560 2898  
2561 2899          err = i40e_clear_lan_tx_queue_context(hw, itrq->itrq_index);
2562 2900          if (err != I40E_SUCCESS) {
↓ open down ↓ 83 lines elided ↑ open up ↑
2646 2984                          return (B_FALSE);
2647 2985                  }
2648 2986          }
2649 2987  
2650 2988          return (B_TRUE);
2651 2989  }
2652 2990  
2653 2991  void
2654 2992  i40e_stop(i40e_t *i40e, boolean_t free_allocations)
2655 2993  {
2656      -        int i;
     2994 +        uint_t i;
     2995 +        i40e_hw_t *hw = &i40e->i40e_hw_space;
2657 2996  
2658 2997          ASSERT(MUTEX_HELD(&i40e->i40e_general_lock));
2659 2998  
2660 2999          /*
2661 3000           * Shutdown and drain the tx and rx pipeline. We do this using the
2662 3001           * following steps.
2663 3002           *
2664 3003           * 1) Shutdown interrupts to all the queues (trying to keep the admin
2665 3004           *    queue alive).
2666 3005           *
↓ open down ↓ 15 lines elided ↑ open up ↑
2682 3021           */
2683 3022          i40e_intr_io_disable_all(i40e);
2684 3023          i40e_intr_io_clear_cause(i40e);
2685 3024  
2686 3025          if (i40e_shutdown_rings(i40e) == B_FALSE) {
2687 3026                  ddi_fm_service_impact(i40e->i40e_dip, DDI_SERVICE_LOST);
2688 3027          }
2689 3028  
2690 3029          delay(50 * drv_usectohz(1000));
2691 3030  
     3031 +        /*
     3032 +         * We don't delete the default VSI because it replaces the VEB
     3033 +         * after VEB deletion (see the "Delete Element" section).
     3034 +         * Furthermore, since the default VSI is provided by the
     3035 +         * firmware, we never attempt to delete it.
     3036 +         */
     3037 +        for (i = 1; i < i40e->i40e_num_rx_groups; i++) {
     3038 +                i40e_delete_vsi(i40e, i);
     3039 +        }
     3040 +
     3041 +        if (i40e->i40e_veb_seid != 0) {
     3042 +                int rc = i40e_aq_delete_element(hw, i40e->i40e_veb_seid, NULL);
     3043 +
     3044 +                if (rc != I40E_SUCCESS) {
     3045 +                        i40e_error(i40e, "Failed to delete VEB %d: %d", rc,
     3046 +                            hw->aq.asq_last_status);
     3047 +                }
     3048 +
     3049 +                i40e->i40e_veb_seid = 0;
     3050 +        }
     3051 +
2692 3052          i40e_intr_chip_fini(i40e);
2693 3053  
2694 3054          for (i = 0; i < i40e->i40e_num_trqpairs; i++) {
2695 3055                  mutex_enter(&i40e->i40e_trqpairs[i].itrq_rx_lock);
2696 3056                  mutex_enter(&i40e->i40e_trqpairs[i].itrq_tx_lock);
2697 3057          }
2698 3058  
2699 3059          /*
2700 3060           * We should consider refactoring this to be part of the ring start /
2701 3061           * stop routines at some point.
↓ open down ↓ 9 lines elided ↑ open up ↑
2711 3071  
2712 3072          for (i = 0; i < i40e->i40e_num_trqpairs; i++) {
2713 3073                  i40e_tx_cleanup_ring(&i40e->i40e_trqpairs[i]);
2714 3074          }
2715 3075  
2716 3076          for (i = 0; i < i40e->i40e_num_trqpairs; i++) {
2717 3077                  mutex_exit(&i40e->i40e_trqpairs[i].itrq_rx_lock);
2718 3078                  mutex_exit(&i40e->i40e_trqpairs[i].itrq_tx_lock);
2719 3079          }
2720 3080  
2721      -        i40e_stat_vsi_fini(i40e);
     3081 +        for (i = 0; i < i40e->i40e_num_rx_groups; i++) {
     3082 +                i40e_stat_vsi_fini(i40e, i);
     3083 +        }
2722 3084  
2723 3085          i40e->i40e_link_speed = 0;
2724 3086          i40e->i40e_link_duplex = 0;
2725 3087          i40e_link_state_set(i40e, LINK_STATE_UNKNOWN);
2726 3088  
2727 3089          if (free_allocations) {
2728 3090                  i40e_free_ring_mem(i40e, B_FALSE);
2729 3091          }
2730 3092  }
2731 3093  
↓ open down ↓ 44 lines elided ↑ open up ↑
2776 3138  
2777 3139          if (i40e_setup_tx_rings(i40e) == B_FALSE) {
2778 3140                  rc = B_FALSE;
2779 3141                  goto done;
2780 3142          }
2781 3143  
2782 3144          /*
2783 3145           * Enable broadcast traffic; however, do not enable multicast traffic.
2784 3146           * That's handle exclusively through MAC's mc_multicst routines.
2785 3147           */
2786      -        err = i40e_aq_set_vsi_broadcast(hw, i40e->i40e_vsi_id, B_TRUE, NULL);
     3148 +        err = i40e_aq_set_vsi_broadcast(hw, I40E_DEF_VSI_SEID(i40e), B_TRUE,
     3149 +            NULL);
2787 3150          if (err != I40E_SUCCESS) {
2788 3151                  i40e_error(i40e, "failed to set default VSI: %d", err);
2789 3152                  rc = B_FALSE;
2790 3153                  goto done;
2791 3154          }
2792 3155  
2793 3156          err = i40e_aq_set_mac_config(hw, i40e->i40e_frame_max, B_TRUE, 0, NULL);
2794 3157          if (err != I40E_SUCCESS) {
2795 3158                  i40e_error(i40e, "failed to set MAC config: %d", err);
2796 3159                  rc = B_FALSE;
↓ open down ↓ 302 lines elided ↑ open up ↑
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX