Print this page
NEX-17397 Getting panic: kernel heap corruption detected when trying to create big count of iSCSI targets and mappings
Reviewed by: Evan Layton <evan.layton@nexenta.com>
Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
Reviewed by: Dan Fields <dan.fields@nexenta.com>
Reviewed by: Rob Gittins <rob.gittins@nexenta.com>
NEX-18878 Getting BAD TRAP panic when payload to iSCSI or FC shared zvol from client
    Reviewed by: tim Jacobson <tim.jacobson@nexenta.com>
    Reviewed by: Joyce McIntosh <joyce.mcintosh@nexenta.com>
    Reviewed by: Sanjay Nadkarni <sanjay.nadkarni@nexenta.com>
    Reviewed by: Evan Layton <evan.layton@nexenta.com>
NEX-16937 Poor performance observed during large delete / SCSI UNMAP operations (Comstar portion)
NEX-16711 STMF task workers do not scale correctly
Reviewed by: Rick McNeal <rick.mcneal@nexenta.com>
Reviewed by: Evan Layton <evan.layton@nexenta.com>
NEX-15346 COMSTAR hang with thousands of threads waiting in idm_refcnt_wait_ref() from iscsit_conn_lost()
Reviewed by: Rob Gittins <rob.gittins@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Dan Fields <dan.fields@nexenta.com>
NEX-15346 COMSTAR hang with thousands of threads waiting in idm_refcnt_wait_ref() from iscsit_conn_lost()
Reviewed by: Rob Gittins <rob.gittins@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Dan Fields <dan.fields@nexenta.com>
Running stmfadm remove-hg-member caused a NULL pointer dereference panic in stmf_remove_lu_from_session
Reviewed by: Dan Fields <dan.fields@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Evan Layton <evan.layton@nexenta.com>
Running stmfadm remove-hg-member caused a NULL pointer dereference panic in stmf_remove_lu_from_session
Reviewed by: Dan Fields <dan.fields@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
Reviewed by: Evan Layton <evan.layton@nexenta.com>
NEX-9567 Deadlock: cycle in blocking chain panic
Reviewed by: Rob Gittins <rob.gittins@nexenta.com>
Reviewed by: Evan Layton <evan.layton@nexenta.com>
NEX-7273 nbmand makes NFSv4 RENAME of an open file inoperable
Reviewed by: Matt Barden <matt.barden@nexenta.com>
Reviewed by: Evan Layton <evan.layton@nexenta.com>
NEX-7681 System panics: Deadlock: cycle in blocking chain
Reviewed by: Rob Gittins <rob.gittins@nexenta.com>
Reviewed by: Steve Peng <steve.peng@nexenta.com>
Reviewed by: Evan Layton <evan.layton@nexenta.com>
Reviewed by: Bayard Bell <bayard.bell@nexenta.com>
NEX-6018 Return of the walking dead idm_refcnt_wait_ref comstar threads
Reviewed by:  Rick McNeal <rick.mcneal@nexenta.com>
Reviewed by:  Evan Layton <evan.layton@nexenta.com>
NEX-5428 Backout the 5.0 changes
NEX-2937 Continuous write_same starves all other commands
Reviewed by: Rick McNeal <rick.mcneal@nexenta.com>
Reviewed by: Steve Peng <steve.peng@nexenta.com>
NEX-4953 want uintptr32_t in sys/types32.h
Reviewed by: Josef 'Jeff' Sipek <josef.sipek@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
NEX-4905 While Deleting an iSCSI target, the appliance
         gets in to panic mode(restart)
Reviewed by: Rob Gittins <rob.gittins@nexenta.com>
NEX-4928 Kernel panic in stfm_deregister_lu function
Reviewed by: Rob Gittins <rob.gittins@nexenta.com>
SUP-770 deadlock between thread acquiring iss->iss_lockp in stmf_task_free() and thread holding sl->sl_pgr->pgr_lock from sbd_pgr_remove_it_handle()
Reviewed by: Rick McNeal <rick.mcneal@nexenta.com>
NEX-3622 COMSTAR should have per remote port kstats for I/O and latency
Reviewed by: Rob Gittins <rob.gittins@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
NEX-4026 panic in smb_session_send
NEX-3738 Should support SMB2_CAP_LARGE_MTU (uio bug)
Reviewed by: Alek Pinchuk <alek.pinchuk@nexenta.com>
Reviewed by: Yuri Pankov <yuri.pankov@nexenta.com>
NEX-3566 assertion failed: lu == dlun0 || (ilu->ilu_state != STMF_STATE_OFFLINING && ilu->ilu_state != STMF_STATE_OFFLINE), file: ../../common/io /comstar/stmf/stmf.c, line: 4063
Reviewed by: Rick McNeal <rick.mcneal@nexenta.com>
NEX-3204 Panic doing FC rescan from ESXi 5.5u1 with VAAI enabled
Reviewed by: Steve Peng <steve.peng@nexenta.com>
Reviewed by: Rick McNeal <rick.mcneal@nexenta.com>
Reviewed by: Gordon Ross <gordon.ross@nexenta.com>
NEX-3217 Panic running benchmark at ESX VM
NEX-3204 Panic doing FC rescan from ESXi 5.5u1 with VAAI enabled
        Reviewed by: Rick McNeal <rick.mcneal@nexenta.com>
        Reviewed by: Tony Nguyen <tony.nguyen@nexenta.com>
NEX-2529 NFS service fails to start after hostname change
Reviewed by: Gordon Ross <gordon.ross@nexenta.com>
NEX-3259 COMSTAR has insufficient task workers
        Reviewed by: Rick McNeal <rick.mcneal@nexenta.com>
NEX-3169 STMF has duplicate code in 6 places which is error prone.
Reviewed by: Rob Gittins <rob.gittins@nexenta.com>
Reviewby by: Steve Ping <steve.ping@nexenta.com>
NEX-3111 Comstar does not pass cstyle and hdrchk
        Reviewed by: Jean McCormack <jean.mccormack@nexenta.com>
        Reviewed by: Rick McNeal <rick.mcneal@nexenta.com>
        Reviewed by: Tony Nguyen <tony.nguyen@nexenta.com>
NEX-3104 Panic in stmf_dlun0_done
        Review by: Rick McNeal <rick.mcneal@nexenta.com>
NEX-3023 Panics and hangs when using write_same and compare_and_write
Review by: Bayard Bell <bayard.bell@nexenta.com>
Review by: Rick McNeal <rick.mcneal@nexenta.com>
Review by: Jean McCormack <jean.mccormack@nexenta.com>
Approved by: Jean McCormack <jean.mccormack@nexenta.com>
Related bug: NEX-2723 Kernel panic in xfer_completion code for write_same (0x93) and compare_and_write (0x89)
SUP-772 COMSTAR task waiting pressure doesn't increase worker thread pool sizing above initial value
SUP-765 When a Windows Clustered Shared Volume is placed on a pool under Nexenta HA Cluster control the clustered shared disk looses its PGR3 reservation to the presented zvol.
Reviewed by: Bayard Bell <bayard.bell@nexenta.com>
Reviewed by: Tony Nguyen <tony.nguyen@nexenta.com>
Reviewed by: Josef Sipek <josef.sipek@nexenta.com>
NEX-988 itask_lu_[read|write]_time was inadvertently removed by the Illumos 3862 fix
SUP-540 panic on page fault in stmf_task_lu_free()
re #13796 OSX FC Initiator cannot attach to LUN if LUN id is different then other OSX
re #7936 rb3706 Support for COMSTAR/OEM
re #8002 rb3706 Allow setting iSCSI vendor ID via stmf_sbd.conf
re #11454 rb3750 Fix inconsistent vid/pid in stmf
re #7550 rb2134 lint-clean nza-kernel
Re #6790 backspace should perform delete on console
VAAI (XXX ATS support for COMSTAR, YYY Block-copy support for COMSTAR)

*** 16,30 **** * fields enclosed by brackets "[]" replaced with your own identifying * information: Portions Copyright [yyyy] [name of copyright owner] * * CDDL HEADER END */ /* * Copyright (c) 2008, 2010, Oracle and/or its affiliates. All rights reserved. */ /* ! * Copyright 2012, Nexenta Systems, Inc. All rights reserved. * Copyright (c) 2013 by Delphix. All rights reserved. * Copyright (c) 2013 by Saso Kiselkov. All rights reserved. */ #include <sys/conf.h> --- 16,31 ---- * fields enclosed by brackets "[]" replaced with your own identifying * information: Portions Copyright [yyyy] [name of copyright owner] * * CDDL HEADER END */ + /* * Copyright (c) 2008, 2010, Oracle and/or its affiliates. All rights reserved. */ /* ! * Copyright 2019 Nexenta Systems, Inc. All rights reserved. * Copyright (c) 2013 by Delphix. All rights reserved. * Copyright (c) 2013 by Saso Kiselkov. All rights reserved. */ #include <sys/conf.h>
*** 122,132 **** void stmf_delete_ppd(stmf_pp_data_t *ppd); void stmf_delete_all_ppds(); void stmf_trace_clear(); void stmf_worker_init(); stmf_status_t stmf_worker_fini(); - void stmf_worker_mgmt(); void stmf_worker_task(void *arg); static void stmf_task_lu_free(scsi_task_t *task, stmf_i_scsi_session_t *iss); static stmf_status_t stmf_ic_lu_reg(stmf_ic_reg_dereg_lun_msg_t *msg, uint32_t type); static stmf_status_t stmf_ic_lu_dereg(stmf_ic_reg_dereg_lun_msg_t *msg); --- 123,132 ----
*** 162,174 **** --- 162,179 ---- static void stmf_update_kstat_lu_q(scsi_task_t *, void()); static void stmf_update_kstat_lport_q(scsi_task_t *, void()); static void stmf_update_kstat_lu_io(scsi_task_t *, stmf_data_buf_t *); static void stmf_update_kstat_lport_io(scsi_task_t *, stmf_data_buf_t *); + static hrtime_t stmf_update_rport_timestamps(hrtime_t *start_tstamp, + hrtime_t *done_tstamp, stmf_i_scsi_task_t *itask); static int stmf_irport_compare(const void *void_irport1, const void *void_irport2); + static void stmf_create_kstat_rport(stmf_i_remote_port_t *irport); + static void stmf_destroy_kstat_rport(stmf_i_remote_port_t *irport); + static int stmf_kstat_rport_update(kstat_t *ksp, int rw); static stmf_i_remote_port_t *stmf_irport_create(scsi_devid_desc_t *rport_devid); static void stmf_irport_destroy(stmf_i_remote_port_t *irport); static stmf_i_remote_port_t *stmf_irport_register( scsi_devid_desc_t *rport_devid); static stmf_i_remote_port_t *stmf_irport_lookup_locked(
*** 177,187 **** extern struct mod_ops mod_driverops; /* =====[ Tunables ]===== */ /* Internal tracing */ ! volatile int stmf_trace_on = 1; volatile int stmf_trace_buf_size = (1 * 1024 * 1024); /* * The reason default task timeout is 75 is because we want the * host to timeout 1st and mostly host timeout is 60 seconds. */ --- 182,192 ---- extern struct mod_ops mod_driverops; /* =====[ Tunables ]===== */ /* Internal tracing */ ! volatile int stmf_trace_on = 0; volatile int stmf_trace_buf_size = (1 * 1024 * 1024); /* * The reason default task timeout is 75 is because we want the * host to timeout 1st and mostly host timeout is 60 seconds. */
*** 190,207 **** * Setting this to one means, you are responsible for config load and keeping * things in sync with persistent database. */ volatile int stmf_allow_modunload = 0; ! volatile int stmf_max_nworkers = 256; ! volatile int stmf_min_nworkers = 4; ! volatile int stmf_worker_scale_down_delay = 20; /* === [ Debugging and fault injection ] === */ #ifdef DEBUG ! volatile uint32_t stmf_drop_task_counter = 0; ! volatile uint32_t stmf_drop_buf_counter = 0; #endif stmf_state_t stmf_state; static stmf_lu_t *dlun0; --- 195,211 ---- * Setting this to one means, you are responsible for config load and keeping * things in sync with persistent database. */ volatile int stmf_allow_modunload = 0; ! volatile int stmf_nworkers = 512; ! volatile int stmf_worker_warn = 0; /* === [ Debugging and fault injection ] === */ #ifdef DEBUG ! volatile int stmf_drop_task_counter = 0; ! volatile int stmf_drop_buf_counter = 0; #endif stmf_state_t stmf_state; static stmf_lu_t *dlun0;
*** 219,242 **** static enum { STMF_WORKERS_DISABLED = 0, STMF_WORKERS_ENABLING, STMF_WORKERS_ENABLED } stmf_workers_state = STMF_WORKERS_DISABLED; ! static int stmf_i_max_nworkers; ! static int stmf_i_min_nworkers; ! static int stmf_nworkers_cur; /* # of workers currently running */ ! static int stmf_nworkers_needed; /* # of workers need to be running */ static int stmf_worker_sel_counter = 0; static uint32_t stmf_cur_ntasks = 0; ! static clock_t stmf_wm_last = 0; ! /* ! * This is equal to stmf_nworkers_cur while we are increasing # workers and ! * stmf_nworkers_needed while we are decreasing the worker count. ! */ static int stmf_nworkers_accepting_cmds; static stmf_worker_t *stmf_workers = NULL; - static clock_t stmf_worker_mgmt_delay = 2; static clock_t stmf_worker_scale_down_timer = 0; static int stmf_worker_scale_down_qd = 0; static struct cb_ops stmf_cb_ops = { stmf_open, /* open */ --- 223,239 ---- static enum { STMF_WORKERS_DISABLED = 0, STMF_WORKERS_ENABLING, STMF_WORKERS_ENABLED } stmf_workers_state = STMF_WORKERS_DISABLED; ! static kmutex_t stmf_worker_sel_mx; ! volatile uint32_t stmf_nworkers_cur = 0; /* # of workers currently running */ static int stmf_worker_sel_counter = 0; static uint32_t stmf_cur_ntasks = 0; ! static clock_t stmf_wm_next = 0; static int stmf_nworkers_accepting_cmds; static stmf_worker_t *stmf_workers = NULL; static clock_t stmf_worker_scale_down_timer = 0; static int stmf_worker_scale_down_qd = 0; static struct cb_ops stmf_cb_ops = { stmf_open, /* open */
*** 271,283 **** &stmf_cb_ops, NULL, /* bus_ops */ NULL /* power */ }; ! #define STMF_NAME "COMSTAR STMF" #define STMF_MODULE_NAME "stmf" static struct modldrv modldrv = { &mod_driverops, STMF_NAME, &stmf_ops }; --- 268,286 ---- &stmf_cb_ops, NULL, /* bus_ops */ NULL /* power */ }; ! #define STMF_MODULE_NAME "stmf" + #ifdef DEBUG + #define STMF_NAME "COMSTAR STMF D " __DATE__ " " __TIME__ + #else + #define STMF_NAME "COMSTAR STMF" + #endif + static struct modldrv modldrv = { &mod_driverops, STMF_NAME, &stmf_ops };
*** 298,307 **** --- 301,311 ---- return (ret); stmf_trace_buf = kmem_zalloc(stmf_trace_buf_size, KM_SLEEP); trace_buf_size = stmf_trace_buf_size; trace_buf_curndx = 0; mutex_init(&trace_buf_lock, NULL, MUTEX_DRIVER, 0); + mutex_init(&stmf_worker_sel_mx, NULL, MUTEX_ADAPTIVE, 0); bzero(&stmf_state, sizeof (stmf_state_t)); /* STMF service is off by default */ stmf_state.stmf_service_running = 0; /* default lu/lport states are online */ stmf_state.stmf_default_lu_state = STMF_STATE_ONLINE;
*** 368,377 **** --- 372,382 ---- id_space_destroy(stmf_state.stmf_irport_inst_space); kmem_free(stmf_trace_buf, stmf_trace_buf_size); mutex_destroy(&trace_buf_lock); mutex_destroy(&stmf_state.stmf_lock); + mutex_destroy(&stmf_worker_sel_mx); cv_destroy(&stmf_state.stmf_cv); return (ret); } int
*** 1653,1666 **** --- 1658,1674 ---- } mutex_enter(&ilu->ilu_task_lock); for (itask = ilu->ilu_tasks; itask != NULL; itask = itask->itask_lu_next) { + mutex_enter(&itask->itask_mutex); if (itask->itask_flags & (ITASK_IN_FREE_LIST | ITASK_BEING_ABORTED)) { + mutex_exit(&itask->itask_mutex); continue; } + mutex_exit(&itask->itask_mutex); if (itask->itask_proxy_msg_id == task_msgid) { break; } } mutex_exit(&ilu->ilu_task_lock);
*** 1901,1929 **** * If this is a task management function, we're really just * duping the command to the peer. Set the TM bit so that * we can recognize this on return since we won't be completing * the proxied task in that case. */ if (task->task_mgmt_function) { itask->itask_proxy_msg_id |= MSG_ID_TM_BIT; } else { ! uint32_t new, old; ! do { ! new = old = itask->itask_flags; ! if (new & ITASK_BEING_ABORTED) return (STMF_FAILURE); - new |= ITASK_DEFAULT_HANDLING | ITASK_PROXY_TASK; - } while (atomic_cas_32(&itask->itask_flags, old, new) != old); } if (dbuf) { ic_cmd_msg = ic_scsi_cmd_msg_alloc(itask->itask_proxy_msg_id, task, dbuf->db_data_size, dbuf->db_sglist[0].seg_addr, itask->itask_proxy_msg_id); } else { ic_cmd_msg = ic_scsi_cmd_msg_alloc(itask->itask_proxy_msg_id, task, 0, NULL, itask->itask_proxy_msg_id); } if (ic_cmd_msg) { ic_ret = ic_tx_msg(ic_cmd_msg); if (ic_ret == STMF_IC_MSG_SUCCESS) { ret = STMF_SUCCESS; } --- 1909,1937 ---- * If this is a task management function, we're really just * duping the command to the peer. Set the TM bit so that * we can recognize this on return since we won't be completing * the proxied task in that case. */ + mutex_enter(&itask->itask_mutex); if (task->task_mgmt_function) { itask->itask_proxy_msg_id |= MSG_ID_TM_BIT; } else { ! if (itask->itask_flags & ITASK_BEING_ABORTED) { ! mutex_exit(&itask->itask_mutex); return (STMF_FAILURE); } + itask->itask_flags |= ITASK_DEFAULT_HANDLING | ITASK_PROXY_TASK; + } if (dbuf) { ic_cmd_msg = ic_scsi_cmd_msg_alloc(itask->itask_proxy_msg_id, task, dbuf->db_data_size, dbuf->db_sglist[0].seg_addr, itask->itask_proxy_msg_id); } else { ic_cmd_msg = ic_scsi_cmd_msg_alloc(itask->itask_proxy_msg_id, task, 0, NULL, itask->itask_proxy_msg_id); } + mutex_exit(&itask->itask_mutex); if (ic_cmd_msg) { ic_ret = ic_tx_msg(ic_cmd_msg); if (ic_ret == STMF_IC_MSG_SUCCESS) { ret = STMF_SUCCESS; }
*** 2523,2532 **** --- 2531,2541 ---- mutex_exit(&stmf_state.stmf_lock); return (ret); } /* Free any existing lists and add this one to the ppd */ + if (ppd->ppd_nv) nvlist_free(ppd->ppd_nv); ppd->ppd_nv = nv; /* set the token for writes */ ppd->ppd_token++;
*** 2595,2604 **** --- 2604,2614 ---- if (*pppd == NULL) return; *pppd = ppd->ppd_next; + if (ppd->ppd_nv) nvlist_free(ppd->ppd_nv); kmem_free(ppd, ppd->ppd_alloc_size); }
*** 2704,2713 **** --- 2714,2725 ---- * 16 is the max string length of a protocol_ident, increase * the size if needed. */ #define STMF_KSTAT_LU_SZ (STMF_GUID_INPUT + 1 + 256) #define STMF_KSTAT_TGT_SZ (256 * 2 + 16) + #define STMF_KSTAT_RPORT_DATAMAX (sizeof (stmf_kstat_rport_info_t) / \ + sizeof (kstat_named_t)) /* * This array matches the Protocol Identifier in stmf_ioctl.h */ #define MAX_PROTO_STR_LEN 32
*** 2781,2790 **** --- 2793,2892 ---- } } } static void + stmf_update_kstat_rport_io(scsi_task_t *task, stmf_data_buf_t *dbuf) + { + stmf_i_scsi_session_t *iss; + stmf_i_remote_port_t *irport; + kstat_io_t *kip; + + iss = task->task_session->ss_stmf_private; + irport = iss->iss_irport; + if (irport->irport_kstat_io != NULL) { + kip = KSTAT_IO_PTR(irport->irport_kstat_io); + mutex_enter(irport->irport_kstat_io->ks_lock); + STMF_UPDATE_KSTAT_IO(kip, dbuf); + mutex_exit(irport->irport_kstat_io->ks_lock); + } + } + + static void + stmf_update_kstat_rport_estat(scsi_task_t *task) + { + stmf_i_scsi_task_t *itask; + stmf_i_scsi_session_t *iss; + stmf_i_remote_port_t *irport; + stmf_kstat_rport_estat_t *ks_estat; + hrtime_t lat = 0; + uint32_t n = 0; + + itask = task->task_stmf_private; + iss = task->task_session->ss_stmf_private; + irport = iss->iss_irport; + + if (irport->irport_kstat_estat == NULL) + return; + + ks_estat = (stmf_kstat_rport_estat_t *)KSTAT_NAMED_PTR( + irport->irport_kstat_estat); + + mutex_enter(irport->irport_kstat_estat->ks_lock); + + if (task->task_flags & TF_READ_DATA) + n = atomic_dec_32_nv(&irport->irport_nread_tasks); + else if (task->task_flags & TF_WRITE_DATA) + n = atomic_dec_32_nv(&irport->irport_nwrite_tasks); + + if (itask->itask_read_xfer > 0) { + ks_estat->i_nread_tasks.value.ui64++; + lat = stmf_update_rport_timestamps( + &irport->irport_rdstart_timestamp, + &irport->irport_rddone_timestamp, itask); + if (n == 0) + ks_estat->i_rport_read_latency.value.ui64 += lat; + } else if ((itask->itask_write_xfer > 0) || + (task->task_flags & TF_INITIAL_BURST)) { + ks_estat->i_nwrite_tasks.value.ui64++; + lat = stmf_update_rport_timestamps( + &irport->irport_wrstart_timestamp, + &irport->irport_wrdone_timestamp, itask); + if (n == 0) + ks_estat->i_rport_write_latency.value.ui64 += lat; + } + + if (n == 0) { + if (task->task_flags & TF_READ_DATA) { + irport->irport_rdstart_timestamp = LLONG_MAX; + irport->irport_rddone_timestamp = 0; + } else if (task->task_flags & TF_WRITE_DATA) { + irport->irport_wrstart_timestamp = LLONG_MAX; + irport->irport_wrdone_timestamp = 0; + } + } + + mutex_exit(irport->irport_kstat_estat->ks_lock); + } + + static hrtime_t + stmf_update_rport_timestamps(hrtime_t *start_tstamp, hrtime_t *done_tstamp, + stmf_i_scsi_task_t *itask) + { + *start_tstamp = MIN(*start_tstamp, itask->itask_start_timestamp); + if ((*done_tstamp == 0) && + (itask->itask_xfer_done_timestamp == 0)) { + *done_tstamp = *start_tstamp; + } else { + *done_tstamp = MAX(*done_tstamp, + itask->itask_xfer_done_timestamp); + } + + return (*done_tstamp - *start_tstamp); + } + + static void stmf_update_kstat_lu_io(scsi_task_t *task, stmf_data_buf_t *dbuf) { stmf_i_lu_t *ilu; kstat_io_t *kip;
*** 3441,3463 **** --- 3543,3684 ---- (struct scsi_devid_desc *)(irport + 1); /* Ptr. Arith. */ bcopy(rport_devid, irport->irport_id, sizeof (scsi_devid_desc_t) + rport_devid->ident_length - 1); irport->irport_refcnt = 1; mutex_init(&irport->irport_mutex, NULL, MUTEX_DEFAULT, NULL); + irport->irport_rdstart_timestamp = LLONG_MAX; + irport->irport_wrstart_timestamp = LLONG_MAX; return (irport); } static void stmf_irport_destroy(stmf_i_remote_port_t *irport) { + stmf_destroy_kstat_rport(irport); id_free(stmf_state.stmf_irport_inst_space, irport->irport_instance); mutex_destroy(&irport->irport_mutex); kmem_free(irport, sizeof (*irport) + sizeof (scsi_devid_desc_t) + irport->irport_id->ident_length - 1); } + static void + stmf_create_kstat_rport(stmf_i_remote_port_t *irport) + { + scsi_devid_desc_t *id = irport->irport_id; + char ks_nm[KSTAT_STRLEN]; + stmf_kstat_rport_info_t *ks_info; + stmf_kstat_rport_estat_t *ks_estat; + char *ident = NULL; + + ks_info = kmem_zalloc(sizeof (*ks_info), KM_NOSLEEP); + if (ks_info == NULL) + goto err_out; + + (void) snprintf(ks_nm, KSTAT_STRLEN, "stmf_rport_%"PRIxPTR"", + (uintptr_t)irport); + irport->irport_kstat_info = kstat_create(STMF_MODULE_NAME, 0, + ks_nm, "misc", KSTAT_TYPE_NAMED, + STMF_KSTAT_RPORT_DATAMAX - STMF_RPORT_INFO_LIMIT, + KSTAT_FLAG_VIRTUAL | KSTAT_FLAG_VAR_SIZE); + if (irport->irport_kstat_info == NULL) { + kmem_free(ks_info, sizeof (*ks_info)); + goto err_out; + } + + irport->irport_kstat_info->ks_data = ks_info; + irport->irport_kstat_info->ks_private = irport; + irport->irport_kstat_info->ks_update = stmf_kstat_rport_update; + ident = kmem_alloc(id->ident_length + 1, KM_NOSLEEP); + if (ident == NULL) { + kstat_delete(irport->irport_kstat_info); + irport->irport_kstat_info = NULL; + kmem_free(ks_info, sizeof (*ks_info)); + goto err_out; + } + + (void) memcpy(ident, id->ident, id->ident_length); + ident[id->ident_length] = '\0'; + kstat_named_init(&ks_info->i_rport_name, "name", KSTAT_DATA_STRING); + kstat_named_init(&ks_info->i_protocol, "protocol", + KSTAT_DATA_STRING); + + kstat_named_setstr(&ks_info->i_rport_name, ident); + kstat_named_setstr(&ks_info->i_protocol, + protocol_ident[irport->irport_id->protocol_id]); + irport->irport_kstat_info->ks_lock = &irport->irport_mutex; + irport->irport_info_dirty = B_TRUE; + kstat_install(irport->irport_kstat_info); + + (void) snprintf(ks_nm, KSTAT_STRLEN, "stmf_rport_io_%"PRIxPTR"", + (uintptr_t)irport); + irport->irport_kstat_io = kstat_create(STMF_MODULE_NAME, 0, ks_nm, + "io", KSTAT_TYPE_IO, 1, 0); + if (irport->irport_kstat_io == NULL) + goto err_out; + + irport->irport_kstat_io->ks_lock = &irport->irport_mutex; + kstat_install(irport->irport_kstat_io); + + (void) snprintf(ks_nm, KSTAT_STRLEN, "stmf_rport_st_%"PRIxPTR"", + (uintptr_t)irport); + irport->irport_kstat_estat = kstat_create(STMF_MODULE_NAME, 0, ks_nm, + "misc", KSTAT_TYPE_NAMED, + sizeof (*ks_estat) / sizeof (kstat_named_t), 0); + if (irport->irport_kstat_estat == NULL) + goto err_out; + + ks_estat = (stmf_kstat_rport_estat_t *)KSTAT_NAMED_PTR( + irport->irport_kstat_estat); + kstat_named_init(&ks_estat->i_rport_read_latency, + "rlatency", KSTAT_DATA_UINT64); + kstat_named_init(&ks_estat->i_rport_write_latency, + "wlatency", KSTAT_DATA_UINT64); + kstat_named_init(&ks_estat->i_nread_tasks, "rntasks", + KSTAT_DATA_UINT64); + kstat_named_init(&ks_estat->i_nwrite_tasks, "wntasks", + KSTAT_DATA_UINT64); + irport->irport_kstat_estat->ks_lock = &irport->irport_mutex; + kstat_install(irport->irport_kstat_estat); + + return; + + err_out: + (void) memcpy(ks_nm, id->ident, MAX(KSTAT_STRLEN - 1, + id->ident_length)); + ks_nm[id->ident_length] = '\0'; + cmn_err(CE_WARN, "STMF: remote port kstat creation failed: %s", ks_nm); + } + + static void + stmf_destroy_kstat_rport(stmf_i_remote_port_t *irport) + { + if (irport->irport_kstat_io != NULL) { + kstat_delete(irport->irport_kstat_io); + } + if (irport->irport_kstat_info != NULL) { + stmf_kstat_rport_info_t *ks_info; + kstat_named_t *knp; + void *ptr; + int i; + + ks_info = (stmf_kstat_rport_info_t *)KSTAT_NAMED_PTR( + irport->irport_kstat_info); + kstat_delete(irport->irport_kstat_info); + ptr = KSTAT_NAMED_STR_PTR(&ks_info->i_rport_name); + kmem_free(ptr, KSTAT_NAMED_STR_BUFLEN(&ks_info->i_rport_name)); + + for (i = 0, knp = ks_info->i_rport_uinfo; + i < STMF_RPORT_INFO_LIMIT; i++, knp++) { + ptr = KSTAT_NAMED_STR_PTR(knp); + if (ptr != NULL) + kmem_free(ptr, KSTAT_NAMED_STR_BUFLEN(knp)); + } + kmem_free(ks_info, sizeof (*ks_info)); + } + } + static stmf_i_remote_port_t * stmf_irport_register(scsi_devid_desc_t *rport_devid) { stmf_i_remote_port_t *irport;
*** 3476,3485 **** --- 3697,3707 ---- if (irport == NULL) { mutex_exit(&stmf_state.stmf_lock); return (NULL); } + stmf_create_kstat_rport(irport); avl_add(&stmf_state.stmf_irportlist, irport); mutex_exit(&stmf_state.stmf_lock); return (irport); }
*** 3599,3617 **** --- 3821,3946 ---- DTRACE_PROBE2(session__online, stmf_local_port_t *, lport, stmf_scsi_session_t *, ss); return (STMF_SUCCESS); } + stmf_status_t + stmf_add_rport_info(stmf_scsi_session_t *ss, + const char *prop_name, const char *prop_value) + { + stmf_i_scsi_session_t *iss = ss->ss_stmf_private; + stmf_i_remote_port_t *irport = iss->iss_irport; + kstat_named_t *knp; + char *s; + int i; + + s = strdup(prop_value); + + mutex_enter(irport->irport_kstat_info->ks_lock); + /* Make sure the caller doesn't try to add already existing property */ + knp = KSTAT_NAMED_PTR(irport->irport_kstat_info); + for (i = 0; i < STMF_KSTAT_RPORT_DATAMAX; i++, knp++) { + if (KSTAT_NAMED_STR_PTR(knp) == NULL) + break; + + ASSERT(strcmp(knp->name, prop_name) != 0); + } + + if (i == STMF_KSTAT_RPORT_DATAMAX) { + mutex_exit(irport->irport_kstat_info->ks_lock); + kmem_free(s, strlen(s) + 1); + return (STMF_FAILURE); + } + + irport->irport_info_dirty = B_TRUE; + kstat_named_init(knp, prop_name, KSTAT_DATA_STRING); + kstat_named_setstr(knp, s); + mutex_exit(irport->irport_kstat_info->ks_lock); + + return (STMF_SUCCESS); + } + void + stmf_remove_rport_info(stmf_scsi_session_t *ss, + const char *prop_name) + { + stmf_i_scsi_session_t *iss = ss->ss_stmf_private; + stmf_i_remote_port_t *irport = iss->iss_irport; + kstat_named_t *knp; + char *s; + int i; + uint32_t len; + + mutex_enter(irport->irport_kstat_info->ks_lock); + knp = KSTAT_NAMED_PTR(irport->irport_kstat_info); + for (i = 0; i < STMF_KSTAT_RPORT_DATAMAX; i++, knp++) { + if ((knp->name != NULL) && (strcmp(knp->name, prop_name) == 0)) + break; + } + + if (i == STMF_KSTAT_RPORT_DATAMAX) { + mutex_exit(irport->irport_kstat_info->ks_lock); + return; + } + + s = KSTAT_NAMED_STR_PTR(knp); + len = KSTAT_NAMED_STR_BUFLEN(knp); + + for (; i < STMF_KSTAT_RPORT_DATAMAX - 1; i++, knp++) { + kstat_named_init(knp, knp[1].name, KSTAT_DATA_STRING); + kstat_named_setstr(knp, KSTAT_NAMED_STR_PTR(&knp[1])); + } + kstat_named_init(knp, "", KSTAT_DATA_STRING); + + irport->irport_info_dirty = B_TRUE; + mutex_exit(irport->irport_kstat_info->ks_lock); + kmem_free(s, len); + } + + static int + stmf_kstat_rport_update(kstat_t *ksp, int rw) + { + stmf_i_remote_port_t *irport = ksp->ks_private; + kstat_named_t *knp; + uint_t ndata = 0; + size_t dsize = 0; + int i; + + if (rw == KSTAT_WRITE) + return (EACCES); + + if (!irport->irport_info_dirty) + return (0); + + knp = KSTAT_NAMED_PTR(ksp); + for (i = 0; i < STMF_KSTAT_RPORT_DATAMAX; i++, knp++) { + if (KSTAT_NAMED_STR_PTR(knp) == NULL) + break; + ndata++; + dsize += KSTAT_NAMED_STR_BUFLEN(knp); + } + + ksp->ks_ndata = ndata; + ksp->ks_data_size = sizeof (kstat_named_t) * ndata + dsize; + irport->irport_info_dirty = B_FALSE; + + return (0); + } + + void stmf_deregister_scsi_session(stmf_local_port_t *lport, stmf_scsi_session_t *ss) { stmf_i_local_port_t *ilport = (stmf_i_local_port_t *) lport->lport_stmf_private; stmf_i_scsi_session_t *iss, **ppss; int found = 0; stmf_ic_msg_t *ic_session_dereg; stmf_status_t ic_ret = STMF_FAILURE; + stmf_lun_map_t *sm; + stmf_i_lu_t *ilu; + uint16_t n; + stmf_lun_map_ent_t *ent; DTRACE_PROBE2(session__offline, stmf_local_port_t *, lport, stmf_scsi_session_t *, ss); iss = (stmf_i_scsi_session_t *)ss->ss_stmf_private;
*** 3657,3675 **** " session"); } ilport->ilport_nsessions--; stmf_irport_deregister(iss->iss_irport); ! (void) stmf_session_destroy_lun_map(ilport, iss); rw_exit(&ilport->ilport_lock); - mutex_exit(&stmf_state.stmf_lock); if (iss->iss_flags & ISS_NULL_TPTID) { stmf_remote_port_free(ss->ss_rport); } } stmf_i_scsi_session_t * stmf_session_id_to_issptr(uint64_t session_id, int stay_locked) { stmf_i_local_port_t *ilport; stmf_i_scsi_session_t *iss; --- 3986,4035 ---- " session"); } ilport->ilport_nsessions--; stmf_irport_deregister(iss->iss_irport); ! /* ! * to avoid conflict with updating session's map, ! * which only grab stmf_lock ! */ ! sm = iss->iss_sm; ! iss->iss_sm = NULL; ! iss->iss_hg = NULL; ! rw_exit(&ilport->ilport_lock); + if (sm->lm_nentries) { + for (n = 0; n < sm->lm_nentries; n++) { + if ((ent = (stmf_lun_map_ent_t *)sm->lm_plus[n]) + != NULL) { + if (ent->ent_itl_datap) { + stmf_do_itl_dereg(ent->ent_lu, + ent->ent_itl_datap, + STMF_ITL_REASON_IT_NEXUS_LOSS); + } + ilu = (stmf_i_lu_t *) + ent->ent_lu->lu_stmf_private; + atomic_dec_32(&ilu->ilu_ref_cnt); + kmem_free(sm->lm_plus[n], + sizeof (stmf_lun_map_ent_t)); + } + } + kmem_free(sm->lm_plus, + sizeof (stmf_lun_map_ent_t *) * sm->lm_nentries); + } + kmem_free(sm, sizeof (*sm)); + if (iss->iss_flags & ISS_NULL_TPTID) { stmf_remote_port_free(ss->ss_rport); } + + mutex_exit(&stmf_state.stmf_lock); } + + stmf_i_scsi_session_t * stmf_session_id_to_issptr(uint64_t session_id, int stay_locked) { stmf_i_local_port_t *ilport; stmf_i_scsi_session_t *iss;
*** 3861,3918 **** kmem_free(itl_list, nmaps * sizeof (stmf_itl_data_t *)); return (STMF_SUCCESS); } - stmf_status_t - stmf_get_itl_handle(stmf_lu_t *lu, uint8_t *lun, stmf_scsi_session_t *ss, - uint64_t session_id, void **itl_handle_retp) - { - stmf_i_scsi_session_t *iss; - stmf_lun_map_ent_t *ent; - stmf_lun_map_t *lm; - stmf_status_t ret; - int i; - uint16_t n; - - if (ss == NULL) { - iss = stmf_session_id_to_issptr(session_id, 1); - if (iss == NULL) - return (STMF_NOT_FOUND); - } else { - iss = (stmf_i_scsi_session_t *)ss->ss_stmf_private; - rw_enter(iss->iss_lockp, RW_WRITER); - } - - ent = NULL; - if (lun == NULL) { - lm = iss->iss_sm; - for (i = 0; i < lm->lm_nentries; i++) { - if (lm->lm_plus[i] == NULL) - continue; - ent = (stmf_lun_map_ent_t *)lm->lm_plus[i]; - if (ent->ent_lu == lu) - break; - } - } else { - n = ((uint16_t)lun[1] | (((uint16_t)(lun[0] & 0x3F)) << 8)); - ent = (stmf_lun_map_ent_t *) - stmf_get_ent_from_map(iss->iss_sm, n); - if (lu && (ent->ent_lu != lu)) - ent = NULL; - } - if (ent && ent->ent_itl_datap) { - *itl_handle_retp = ent->ent_itl_datap->itl_handle; - ret = STMF_SUCCESS; - } else { - ret = STMF_NOT_FOUND; - } - - rw_exit(iss->iss_lockp); - return (ret); - } - stmf_data_buf_t * stmf_alloc_dbuf(scsi_task_t *task, uint32_t size, uint32_t *pminsize, uint32_t flags) { stmf_i_scsi_task_t *itask = --- 4221,4230 ----
*** 4042,4058 **** if (!lun_map_ent) { lu = dlun0; } else { lu = lun_map_ent->ent_lu; } ilu = lu->lu_stmf_private; if (ilu->ilu_flags & ILU_RESET_ACTIVE) { rw_exit(iss->iss_lockp); return (NULL); } ! ASSERT(lu == dlun0 || (ilu->ilu_state != STMF_STATE_OFFLINING && ! ilu->ilu_state != STMF_STATE_OFFLINE)); do { if (ilu->ilu_free_tasks == NULL) { new_task = 1; break; } --- 4354,4386 ---- if (!lun_map_ent) { lu = dlun0; } else { lu = lun_map_ent->ent_lu; } + ilu = lu->lu_stmf_private; if (ilu->ilu_flags & ILU_RESET_ACTIVE) { rw_exit(iss->iss_lockp); return (NULL); } ! ! /* ! * if the LUN is being offlined or is offline then only command ! * that are to query the LUN are allowed. These are handled in ! * stmf via the dlun0 vector. It is possible that a race condition ! * will cause other commands to arrive while the lun is in the ! * process of being offlined. Check for those and just let the ! * protocol stack handle the error. ! */ ! if ((ilu->ilu_state == STMF_STATE_OFFLINING) || ! (ilu->ilu_state == STMF_STATE_OFFLINE)) { ! if (lu != dlun0) { ! rw_exit(iss->iss_lockp); ! return (NULL); ! } ! } ! do { if (ilu->ilu_free_tasks == NULL) { new_task = 1; break; }
*** 4096,4132 **** if (task == NULL) { rw_exit(iss->iss_lockp); return (NULL); } task->task_lu = lu; - l = task->task_lun_no; - l[0] = lun[0]; - l[1] = lun[1]; - l[2] = lun[2]; - l[3] = lun[3]; - l[4] = lun[4]; - l[5] = lun[5]; - l[6] = lun[6]; - l[7] = lun[7]; task->task_cdb = (uint8_t *)task->task_port_private; if ((ulong_t)(task->task_cdb) & 7ul) { task->task_cdb = (uint8_t *)(((ulong_t) (task->task_cdb) + 7ul) & ~(7ul)); } itask = (stmf_i_scsi_task_t *)task->task_stmf_private; itask->itask_cdb_buf_size = cdb_length; mutex_init(&itask->itask_audit_mutex, NULL, MUTEX_DRIVER, NULL); } task->task_session = ss; task->task_lport = lport; task->task_cdb_length = cdb_length_in; itask->itask_flags = ITASK_IN_TRANSITION; itask->itask_waitq_time = 0; itask->itask_lu_read_time = itask->itask_lu_write_time = 0; itask->itask_lport_read_time = itask->itask_lport_write_time = 0; itask->itask_read_xfer = itask->itask_write_xfer = 0; itask->itask_audit_index = 0; if (new_task) { if (lu->lu_task_alloc(task) != STMF_SUCCESS) { rw_exit(iss->iss_lockp); stmf_free(task); --- 4424,4472 ---- if (task == NULL) { rw_exit(iss->iss_lockp); return (NULL); } task->task_lu = lu; task->task_cdb = (uint8_t *)task->task_port_private; if ((ulong_t)(task->task_cdb) & 7ul) { task->task_cdb = (uint8_t *)(((ulong_t) (task->task_cdb) + 7ul) & ~(7ul)); } itask = (stmf_i_scsi_task_t *)task->task_stmf_private; itask->itask_cdb_buf_size = cdb_length; mutex_init(&itask->itask_audit_mutex, NULL, MUTEX_DRIVER, NULL); + mutex_init(&itask->itask_mutex, NULL, MUTEX_DRIVER, NULL); } + + /* + * Since a LUN can be mapped as different LUN ids to different initiator + * groups, we need to set LUN id for a new task and reset LUN id for + * a reused task. + */ + l = task->task_lun_no; + l[0] = lun[0]; + l[1] = lun[1]; + l[2] = lun[2]; + l[3] = lun[3]; + l[4] = lun[4]; + l[5] = lun[5]; + l[6] = lun[6]; + l[7] = lun[7]; + + mutex_enter(&itask->itask_mutex); task->task_session = ss; task->task_lport = lport; task->task_cdb_length = cdb_length_in; itask->itask_flags = ITASK_IN_TRANSITION; itask->itask_waitq_time = 0; itask->itask_lu_read_time = itask->itask_lu_write_time = 0; itask->itask_lport_read_time = itask->itask_lport_write_time = 0; itask->itask_read_xfer = itask->itask_write_xfer = 0; itask->itask_audit_index = 0; + bzero(&itask->itask_audit_records[0], + sizeof (stmf_task_audit_rec_t) * ITASK_TASK_AUDIT_DEPTH); + mutex_exit(&itask->itask_mutex); if (new_task) { if (lu->lu_task_alloc(task) != STMF_SUCCESS) { rw_exit(iss->iss_lockp); stmf_free(task);
*** 4163,4190 **** rw_exit(iss->iss_lockp); return (task); } static void stmf_task_lu_free(scsi_task_t *task, stmf_i_scsi_session_t *iss) { stmf_i_scsi_task_t *itask = (stmf_i_scsi_task_t *)task->task_stmf_private; stmf_i_lu_t *ilu = (stmf_i_lu_t *)task->task_lu->lu_stmf_private; ASSERT(rw_lock_held(iss->iss_lockp)); itask->itask_flags = ITASK_IN_FREE_LIST; itask->itask_proxy_msg_id = 0; mutex_enter(&ilu->ilu_task_lock); itask->itask_lu_free_next = ilu->ilu_free_tasks; ilu->ilu_free_tasks = itask; ilu->ilu_ntasks_free++; if (ilu->ilu_ntasks == ilu->ilu_ntasks_free) cv_signal(&ilu->ilu_offline_pending_cv); mutex_exit(&ilu->ilu_task_lock); - atomic_dec_32(itask->itask_ilu_task_cntr); } void stmf_task_lu_check_freelist(stmf_i_lu_t *ilu) { --- 4503,4541 ---- rw_exit(iss->iss_lockp); return (task); } + /* ARGSUSED */ static void stmf_task_lu_free(scsi_task_t *task, stmf_i_scsi_session_t *iss) { stmf_i_scsi_task_t *itask = (stmf_i_scsi_task_t *)task->task_stmf_private; stmf_i_lu_t *ilu = (stmf_i_lu_t *)task->task_lu->lu_stmf_private; ASSERT(rw_lock_held(iss->iss_lockp)); + ASSERT((itask->itask_flags & ITASK_IN_FREE_LIST) == 0); + ASSERT((itask->itask_flags & ITASK_IN_WORKER_QUEUE) == 0); + ASSERT((itask->itask_flags & ITASK_IN_TRANSITION) == 0); + ASSERT((itask->itask_flags & ITASK_KNOWN_TO_LU) == 0); + ASSERT(mutex_owned(&itask->itask_mutex)); + itask->itask_flags = ITASK_IN_FREE_LIST; + itask->itask_ncmds = 0; itask->itask_proxy_msg_id = 0; + atomic_dec_32(itask->itask_ilu_task_cntr); + itask->itask_worker_next = NULL; + mutex_exit(&itask->itask_mutex); + mutex_enter(&ilu->ilu_task_lock); itask->itask_lu_free_next = ilu->ilu_free_tasks; ilu->ilu_free_tasks = itask; ilu->ilu_ntasks_free++; if (ilu->ilu_ntasks == ilu->ilu_ntasks_free) cv_signal(&ilu->ilu_offline_pending_cv); mutex_exit(&ilu->ilu_task_lock); } void stmf_task_lu_check_freelist(stmf_i_lu_t *ilu) {
*** 4257,4289 **** if (ddi_get_lbolt() >= endtime) break; } } ! void stmf_do_ilu_timeouts(stmf_i_lu_t *ilu) { clock_t l = ddi_get_lbolt(); clock_t ps = drv_usectohz(1000000); stmf_i_scsi_task_t *itask; scsi_task_t *task; uint32_t to; ! mutex_enter(&ilu->ilu_task_lock); for (itask = ilu->ilu_tasks; itask != NULL; itask = itask->itask_lu_next) { if (itask->itask_flags & (ITASK_IN_FREE_LIST | ITASK_BEING_ABORTED)) { continue; } task = itask->itask_task; if (task->task_timeout == 0) to = stmf_default_task_timeout; else to = task->task_timeout; ! if ((itask->itask_start_time + (to * ps)) > l) continue; stmf_abort(STMF_QUEUE_TASK_ABORT, task, STMF_TIMEOUT, NULL); } mutex_exit(&ilu->ilu_task_lock); } --- 4608,4687 ---- if (ddi_get_lbolt() >= endtime) break; } } ! /* ! * Since this method is looking to find tasks that are stuck, lost, or senile ! * it should be more willing to give up scaning during this time period. This ! * is why mutex_tryenter is now used instead of the standard mutex_enter. ! * There has been at least one case were the following occurred. ! * ! * 1) The iscsit_deferred() method is trying to register a session and ! * needs the global lock which is held. ! * 2) Another thread which holds the global lock is trying to deregister a ! * session and needs the session lock. ! * 3) A third thread is allocating a stmf task that has grabbed the session ! * lock and is trying to grab the lun task lock. ! * 4) There's a timeout thread that has the lun task lock and is trying to grab ! * a specific task lock. ! * 5) The thread that has the task lock is waiting for the ref count to go to ! * zero. ! * 6) There's a task that would drop the count to zero, but it's in the task ! * queue waiting to run and is stuck because of #1 is currently block. ! * ! * This method is number 4 in the above chain of events. Had this code ! * originally used mutex_tryenter the chain would have been broken and the ! * system wouldn't have hung. So, now this method uses mutex_tryenter and ! * you know why it does so. ! */ ! /* ---- Only one thread calls stmf_do_ilu_timeouts so no lock required ---- */ ! typedef struct stmf_bailout_cnt { ! int no_ilu_lock; ! int no_task_lock; ! int tasks_checked; ! } stmf_bailout_cnt_t; ! ! stmf_bailout_cnt_t stmf_bailout; ! ! static void stmf_do_ilu_timeouts(stmf_i_lu_t *ilu) { clock_t l = ddi_get_lbolt(); clock_t ps = drv_usectohz(1000000); stmf_i_scsi_task_t *itask; scsi_task_t *task; uint32_t to; ! if (mutex_tryenter(&ilu->ilu_task_lock) == 0) { ! stmf_bailout.no_ilu_lock++; ! return; ! } ! for (itask = ilu->ilu_tasks; itask != NULL; itask = itask->itask_lu_next) { + if (mutex_tryenter(&itask->itask_mutex) == 0) { + stmf_bailout.no_task_lock++; + continue; + } + stmf_bailout.tasks_checked++; if (itask->itask_flags & (ITASK_IN_FREE_LIST | ITASK_BEING_ABORTED)) { + mutex_exit(&itask->itask_mutex); continue; } task = itask->itask_task; if (task->task_timeout == 0) to = stmf_default_task_timeout; else to = task->task_timeout; ! ! if ((itask->itask_start_time + (to * ps)) > l) { ! mutex_exit(&itask->itask_mutex); continue; + } + mutex_exit(&itask->itask_mutex); stmf_abort(STMF_QUEUE_TASK_ABORT, task, STMF_TIMEOUT, NULL); } mutex_exit(&ilu->ilu_task_lock); }
*** 4334,4348 **** { stmf_i_lu_t *ilu = (stmf_i_lu_t *)lu->lu_stmf_private; stmf_i_scsi_task_t *itask; mutex_enter(&ilu->ilu_task_lock); - for (itask = ilu->ilu_tasks; itask != NULL; itask = itask->itask_lu_next) { ! if (itask->itask_flags & ITASK_IN_FREE_LIST) continue; if (itask->itask_task == tm_task) continue; stmf_abort(STMF_QUEUE_TASK_ABORT, itask->itask_task, s, NULL); } mutex_exit(&ilu->ilu_task_lock); --- 4732,4749 ---- { stmf_i_lu_t *ilu = (stmf_i_lu_t *)lu->lu_stmf_private; stmf_i_scsi_task_t *itask; mutex_enter(&ilu->ilu_task_lock); for (itask = ilu->ilu_tasks; itask != NULL; itask = itask->itask_lu_next) { ! mutex_enter(&itask->itask_mutex); ! if (itask->itask_flags & ITASK_IN_FREE_LIST) { ! mutex_exit(&itask->itask_mutex); continue; + } + mutex_exit(&itask->itask_mutex); if (itask->itask_task == tm_task) continue; stmf_abort(STMF_QUEUE_TASK_ABORT, itask->itask_task, s, NULL); } mutex_exit(&ilu->ilu_task_lock);
*** 4394,4406 **** stmf_local_port_t *lport = task->task_lport; stmf_i_scsi_task_t *itask = (stmf_i_scsi_task_t *) task->task_stmf_private; stmf_i_scsi_session_t *iss = (stmf_i_scsi_session_t *) task->task_session->ss_stmf_private; stmf_task_audit(itask, TE_TASK_FREE, CMD_OR_IOF_NA, NULL); ! stmf_free_task_bufs(itask, lport); stmf_itl_task_done(itask); DTRACE_PROBE2(stmf__task__end, scsi_task_t *, task, hrtime_t, itask->itask_done_timestamp - itask->itask_start_timestamp); --- 4795,4810 ---- stmf_local_port_t *lport = task->task_lport; stmf_i_scsi_task_t *itask = (stmf_i_scsi_task_t *) task->task_stmf_private; stmf_i_scsi_session_t *iss = (stmf_i_scsi_session_t *) task->task_session->ss_stmf_private; + stmf_lu_t *lu = task->task_lu; stmf_task_audit(itask, TE_TASK_FREE, CMD_OR_IOF_NA, NULL); ! ASSERT(mutex_owned(&itask->itask_mutex)); ! if ((lu != NULL) && (lu->lu_task_done != NULL)) ! lu->lu_task_done(task); stmf_free_task_bufs(itask, lport); stmf_itl_task_done(itask); DTRACE_PROBE2(stmf__task__end, scsi_task_t *, task, hrtime_t, itask->itask_done_timestamp - itask->itask_start_timestamp);
*** 4410,4420 **** --- 4814,4831 ---- stmf_release_itl_handle(task->task_lu, itask->itask_itl_datap); } } + /* + * To prevent a deadlock condition must release the itask_mutex, + * grab a reader lock on iss_lockp and then reacquire the itask_mutex. + */ + mutex_exit(&itask->itask_mutex); rw_enter(iss->iss_lockp, RW_READER); + mutex_enter(&itask->itask_mutex); + lport->lport_task_free(task); if (itask->itask_worker) { atomic_dec_32(&stmf_cur_ntasks); atomic_dec_32(&itask->itask_worker->worker_ref_count); }
*** 4431,4502 **** { stmf_i_scsi_task_t *itask = (stmf_i_scsi_task_t *) task->task_stmf_private; stmf_i_lu_t *ilu = (stmf_i_lu_t *)task->task_lu->lu_stmf_private; int nv; ! uint32_t old, new; uint32_t ct; ! stmf_worker_t *w, *w1; uint8_t tm; if (task->task_max_nbufs > 4) task->task_max_nbufs = 4; task->task_cur_nbufs = 0; /* Latest value of currently running tasks */ ct = atomic_inc_32_nv(&stmf_cur_ntasks); /* Select the next worker using round robin */ ! nv = (int)atomic_inc_32_nv((uint32_t *)&stmf_worker_sel_counter); ! if (nv >= stmf_nworkers_accepting_cmds) { ! int s = nv; ! do { ! nv -= stmf_nworkers_accepting_cmds; ! } while (nv >= stmf_nworkers_accepting_cmds); ! if (nv < 0) ! nv = 0; ! /* Its ok if this cas fails */ ! (void) atomic_cas_32((uint32_t *)&stmf_worker_sel_counter, ! s, nv); ! } ! w = &stmf_workers[nv]; ! /* ! * A worker can be pinned by interrupt. So select the next one ! * if it has lower load. ! */ ! if ((nv + 1) >= stmf_nworkers_accepting_cmds) { ! w1 = stmf_workers; ! } else { ! w1 = &stmf_workers[nv + 1]; } ! if (w1->worker_queue_depth < w->worker_queue_depth) ! w = w1; mutex_enter(&w->worker_lock); ! if (((w->worker_flags & STMF_WORKER_STARTED) == 0) || ! (w->worker_flags & STMF_WORKER_TERMINATE)) { ! /* ! * Maybe we are in the middle of a change. Just go to ! * the 1st worker. ! */ ! mutex_exit(&w->worker_lock); ! w = stmf_workers; ! mutex_enter(&w->worker_lock); ! } itask->itask_worker = w; /* * Track max system load inside the worker as we already have the * worker lock (no point implementing another lock). The service * thread will do the comparisons and figure out the max overall * system load. */ if (w->worker_max_sys_qdepth_pu < ct) w->worker_max_sys_qdepth_pu = ct; ! do { ! old = new = itask->itask_flags; ! new |= ITASK_KNOWN_TO_TGT_PORT | ITASK_IN_WORKER_QUEUE; if (task->task_mgmt_function) { tm = task->task_mgmt_function; if ((tm == TM_TARGET_RESET) || (tm == TM_TARGET_COLD_RESET) || (tm == TM_TARGET_WARM_RESET)) { --- 4842,4896 ---- { stmf_i_scsi_task_t *itask = (stmf_i_scsi_task_t *) task->task_stmf_private; stmf_i_lu_t *ilu = (stmf_i_lu_t *)task->task_lu->lu_stmf_private; int nv; ! uint32_t new; uint32_t ct; ! stmf_worker_t *w; uint8_t tm; if (task->task_max_nbufs > 4) task->task_max_nbufs = 4; task->task_cur_nbufs = 0; /* Latest value of currently running tasks */ ct = atomic_inc_32_nv(&stmf_cur_ntasks); /* Select the next worker using round robin */ ! mutex_enter(&stmf_worker_sel_mx); ! stmf_worker_sel_counter++; ! if (stmf_worker_sel_counter >= stmf_nworkers) ! stmf_worker_sel_counter = 0; ! nv = stmf_worker_sel_counter; ! /* if the selected worker is not idle then bump to the next worker */ ! if (stmf_workers[nv].worker_queue_depth > 0) { ! stmf_worker_sel_counter++; ! if (stmf_worker_sel_counter >= stmf_nworkers) ! stmf_worker_sel_counter = 0; ! nv = stmf_worker_sel_counter; } ! mutex_exit(&stmf_worker_sel_mx); + w = &stmf_workers[nv]; + + mutex_enter(&itask->itask_mutex); mutex_enter(&w->worker_lock); ! itask->itask_worker = w; + /* * Track max system load inside the worker as we already have the * worker lock (no point implementing another lock). The service * thread will do the comparisons and figure out the max overall * system load. */ if (w->worker_max_sys_qdepth_pu < ct) w->worker_max_sys_qdepth_pu = ct; ! new = itask->itask_flags; ! new |= ITASK_KNOWN_TO_TGT_PORT; if (task->task_mgmt_function) { tm = task->task_mgmt_function; if ((tm == TM_TARGET_RESET) || (tm == TM_TARGET_COLD_RESET) || (tm == TM_TARGET_WARM_RESET)) {
*** 4504,4532 **** } } else if (task->task_cdb[0] == SCMD_REPORT_LUNS) { new |= ITASK_DEFAULT_HANDLING; } new &= ~ITASK_IN_TRANSITION; ! } while (atomic_cas_32(&itask->itask_flags, old, new) != old); stmf_itl_task_start(itask); - itask->itask_worker_next = NULL; - if (w->worker_task_tail) { - w->worker_task_tail->itask_worker_next = itask; - } else { - w->worker_task_head = itask; - } - w->worker_task_tail = itask; - if (++(w->worker_queue_depth) > w->worker_max_qdepth_pu) { - w->worker_max_qdepth_pu = w->worker_queue_depth; - } - /* Measure task waitq time */ - itask->itask_waitq_enter_timestamp = gethrtime(); - atomic_inc_32(&w->worker_ref_count); itask->itask_cmd_stack[0] = ITASK_CMD_NEW_TASK; itask->itask_ncmds = 1; stmf_task_audit(itask, TE_TASK_START, CMD_OR_IOF_NA, dbuf); if (dbuf) { itask->itask_allocated_buf_map = 1; itask->itask_dbufs[0] = dbuf; dbuf->db_handle = 0; --- 4898,4921 ---- } } else if (task->task_cdb[0] == SCMD_REPORT_LUNS) { new |= ITASK_DEFAULT_HANDLING; } new &= ~ITASK_IN_TRANSITION; ! itask->itask_flags = new; stmf_itl_task_start(itask); itask->itask_cmd_stack[0] = ITASK_CMD_NEW_TASK; itask->itask_ncmds = 1; + + if ((task->task_flags & TF_INITIAL_BURST) && + !(curthread->t_flag & T_INTR_THREAD)) { + stmf_update_kstat_lu_io(task, dbuf); + stmf_update_kstat_lport_io(task, dbuf); + stmf_update_kstat_rport_io(task, dbuf); + } + stmf_task_audit(itask, TE_TASK_START, CMD_OR_IOF_NA, dbuf); if (dbuf) { itask->itask_allocated_buf_map = 1; itask->itask_dbufs[0] = dbuf; dbuf->db_handle = 0;
*** 4533,4549 **** } else { itask->itask_allocated_buf_map = 0; itask->itask_dbufs[0] = NULL; } ! if ((w->worker_flags & STMF_WORKER_ACTIVE) == 0) { ! w->worker_signal_timestamp = gethrtime(); ! DTRACE_PROBE2(worker__signal, stmf_worker_t *, w, ! scsi_task_t *, task); ! cv_signal(&w->worker_cv); ! } mutex_exit(&w->worker_lock); /* * This can only happen if during stmf_task_alloc(), ILU_RESET_ACTIVE * was set between checking of ILU_RESET_ACTIVE and clearing of the * ITASK_IN_FREE_LIST flag. Take care of these "sneaked-in" tasks here. --- 4922,4935 ---- } else { itask->itask_allocated_buf_map = 0; itask->itask_dbufs[0] = NULL; } ! STMF_ENQUEUE_ITASK(w, itask); ! mutex_exit(&w->worker_lock); + mutex_exit(&itask->itask_mutex); /* * This can only happen if during stmf_task_alloc(), ILU_RESET_ACTIVE * was set between checking of ILU_RESET_ACTIVE and clearing of the * ITASK_IN_FREE_LIST flag. Take care of these "sneaked-in" tasks here.
*** 4595,4624 **** stmf_i_scsi_task_t *itask = (stmf_i_scsi_task_t *)task->task_stmf_private; stmf_task_audit(itask, TE_XFER_START, ioflags, dbuf); if (ioflags & STMF_IOF_LU_DONE) { ! uint32_t new, old; ! do { ! new = old = itask->itask_flags; ! if (new & ITASK_BEING_ABORTED) return (STMF_ABORTED); - new &= ~ITASK_KNOWN_TO_LU; - } while (atomic_cas_32(&itask->itask_flags, old, new) != old); } ! if (itask->itask_flags & ITASK_BEING_ABORTED) return (STMF_ABORTED); #ifdef DEBUG if (!(ioflags & STMF_IOF_STATS_ONLY) && stmf_drop_buf_counter > 0) { ! if (atomic_dec_32_nv(&stmf_drop_buf_counter) == 1) return (STMF_SUCCESS); } #endif stmf_update_kstat_lu_io(task, dbuf); stmf_update_kstat_lport_io(task, dbuf); stmf_lport_xfer_start(itask, dbuf); if (ioflags & STMF_IOF_STATS_ONLY) { stmf_lport_xfer_done(itask, dbuf); return (STMF_SUCCESS); } --- 4981,5014 ---- stmf_i_scsi_task_t *itask = (stmf_i_scsi_task_t *)task->task_stmf_private; stmf_task_audit(itask, TE_XFER_START, ioflags, dbuf); + mutex_enter(&itask->itask_mutex); if (ioflags & STMF_IOF_LU_DONE) { ! if (itask->itask_flags & ITASK_BEING_ABORTED) { ! mutex_exit(&itask->itask_mutex); return (STMF_ABORTED); } ! itask->itask_flags &= ~ITASK_KNOWN_TO_LU; ! } ! if ((itask->itask_flags & ITASK_BEING_ABORTED) != 0) { ! mutex_exit(&itask->itask_mutex); return (STMF_ABORTED); + } + mutex_exit(&itask->itask_mutex); + #ifdef DEBUG if (!(ioflags & STMF_IOF_STATS_ONLY) && stmf_drop_buf_counter > 0) { ! if (atomic_dec_32_nv((uint32_t *)&stmf_drop_buf_counter) == 1) return (STMF_SUCCESS); } #endif stmf_update_kstat_lu_io(task, dbuf); stmf_update_kstat_lport_io(task, dbuf); + stmf_update_kstat_rport_io(task, dbuf); stmf_lport_xfer_start(itask, dbuf); if (ioflags & STMF_IOF_STATS_ONLY) { stmf_lport_xfer_done(itask, dbuf); return (STMF_SUCCESS); }
*** 4644,4654 **** { stmf_i_scsi_task_t *itask = (stmf_i_scsi_task_t *)task->task_stmf_private; stmf_i_local_port_t *ilport; stmf_worker_t *w = itask->itask_worker; ! uint32_t new, old; uint8_t update_queue_flags, free_it, queue_it; stmf_lport_xfer_done(itask, dbuf); stmf_task_audit(itask, TE_XFER_DONE, iof, dbuf); --- 5034,5044 ---- { stmf_i_scsi_task_t *itask = (stmf_i_scsi_task_t *)task->task_stmf_private; stmf_i_local_port_t *ilport; stmf_worker_t *w = itask->itask_worker; ! uint32_t new; uint8_t update_queue_flags, free_it, queue_it; stmf_lport_xfer_done(itask, dbuf); stmf_task_audit(itask, TE_XFER_DONE, iof, dbuf);
*** 4665,4679 **** cmn_err(CE_PANIC, "Unexpected xfer completion task %p dbuf %p", (void *)task, (void *)dbuf); return; } mutex_enter(&w->worker_lock); ! do { ! new = old = itask->itask_flags; ! if (old & ITASK_BEING_ABORTED) { mutex_exit(&w->worker_lock); return; } free_it = 0; if (iof & STMF_IOF_LPORT_DONE) { new &= ~ITASK_KNOWN_TO_TGT_PORT; --- 5055,5070 ---- cmn_err(CE_PANIC, "Unexpected xfer completion task %p dbuf %p", (void *)task, (void *)dbuf); return; } + mutex_enter(&itask->itask_mutex); mutex_enter(&w->worker_lock); ! new = itask->itask_flags; ! if (itask->itask_flags & ITASK_BEING_ABORTED) { mutex_exit(&w->worker_lock); + mutex_exit(&itask->itask_mutex); return; } free_it = 0; if (iof & STMF_IOF_LPORT_DONE) { new &= ~ITASK_KNOWN_TO_TGT_PORT;
*** 4686,4742 **** * just update the buffer information by grabbing the * worker lock. If the task is not known to LU, * completed/aborted, then see if we need to * free this task. */ ! if (old & ITASK_KNOWN_TO_LU) { free_it = 0; update_queue_flags = 1; ! if (old & ITASK_IN_WORKER_QUEUE) { queue_it = 0; } else { queue_it = 1; - new |= ITASK_IN_WORKER_QUEUE; } } else { update_queue_flags = 0; queue_it = 0; } ! } while (atomic_cas_32(&itask->itask_flags, old, new) != old); if (update_queue_flags) { uint8_t cmd = (dbuf->db_handle << 5) | ITASK_CMD_DATA_XFER_DONE; ASSERT(itask->itask_ncmds < ITASK_MAX_NCMDS); itask->itask_cmd_stack[itask->itask_ncmds++] = cmd; if (queue_it) { ! itask->itask_worker_next = NULL; ! if (w->worker_task_tail) { ! w->worker_task_tail->itask_worker_next = itask; ! } else { ! w->worker_task_head = itask; } - w->worker_task_tail = itask; - /* Measure task waitq time */ - itask->itask_waitq_enter_timestamp = gethrtime(); - if (++(w->worker_queue_depth) > - w->worker_max_qdepth_pu) { - w->worker_max_qdepth_pu = w->worker_queue_depth; - } - if ((w->worker_flags & STMF_WORKER_ACTIVE) == 0) - cv_signal(&w->worker_cv); - } - } mutex_exit(&w->worker_lock); if (free_it) { if ((itask->itask_flags & (ITASK_KNOWN_TO_LU | ITASK_KNOWN_TO_TGT_PORT | ITASK_IN_WORKER_QUEUE | ITASK_BEING_ABORTED)) == 0) { stmf_task_free(task); } } } stmf_status_t stmf_send_scsi_status(scsi_task_t *task, uint32_t ioflags) { --- 5077,5125 ---- * just update the buffer information by grabbing the * worker lock. If the task is not known to LU, * completed/aborted, then see if we need to * free this task. */ ! if (itask->itask_flags & ITASK_KNOWN_TO_LU) { free_it = 0; update_queue_flags = 1; ! if (itask->itask_flags & ITASK_IN_WORKER_QUEUE) { queue_it = 0; } else { queue_it = 1; } } else { update_queue_flags = 0; queue_it = 0; } ! itask->itask_flags = new; if (update_queue_flags) { uint8_t cmd = (dbuf->db_handle << 5) | ITASK_CMD_DATA_XFER_DONE; + ASSERT((itask->itask_flags & ITASK_IN_FREE_LIST) == 0); ASSERT(itask->itask_ncmds < ITASK_MAX_NCMDS); + itask->itask_cmd_stack[itask->itask_ncmds++] = cmd; if (queue_it) { ! STMF_ENQUEUE_ITASK(w, itask); } mutex_exit(&w->worker_lock); + mutex_exit(&itask->itask_mutex); + return; + } + mutex_exit(&w->worker_lock); if (free_it) { if ((itask->itask_flags & (ITASK_KNOWN_TO_LU | ITASK_KNOWN_TO_TGT_PORT | ITASK_IN_WORKER_QUEUE | ITASK_BEING_ABORTED)) == 0) { stmf_task_free(task); + return; } } + mutex_exit(&itask->itask_mutex); } stmf_status_t stmf_send_scsi_status(scsi_task_t *task, uint32_t ioflags) {
*** 4745,4770 **** stmf_i_scsi_task_t *itask = (stmf_i_scsi_task_t *)task->task_stmf_private; stmf_task_audit(itask, TE_SEND_STATUS, ioflags, NULL); if (ioflags & STMF_IOF_LU_DONE) { ! uint32_t new, old; ! do { ! new = old = itask->itask_flags; ! if (new & ITASK_BEING_ABORTED) return (STMF_ABORTED); - new &= ~ITASK_KNOWN_TO_LU; - } while (atomic_cas_32(&itask->itask_flags, old, new) != old); } if (!(itask->itask_flags & ITASK_KNOWN_TO_TGT_PORT)) { return (STMF_SUCCESS); } ! if (itask->itask_flags & ITASK_BEING_ABORTED) return (STMF_ABORTED); if (task->task_additional_flags & TASK_AF_NO_EXPECTED_XFER_LENGTH) { task->task_status_ctrl = 0; task->task_resid = 0; } else if (task->task_cmd_xfer_length > --- 5128,5156 ---- stmf_i_scsi_task_t *itask = (stmf_i_scsi_task_t *)task->task_stmf_private; stmf_task_audit(itask, TE_SEND_STATUS, ioflags, NULL); + mutex_enter(&itask->itask_mutex); if (ioflags & STMF_IOF_LU_DONE) { ! if (itask->itask_flags & ITASK_BEING_ABORTED) { ! mutex_exit(&itask->itask_mutex); return (STMF_ABORTED); } + itask->itask_flags &= ~ITASK_KNOWN_TO_LU; + } if (!(itask->itask_flags & ITASK_KNOWN_TO_TGT_PORT)) { + mutex_exit(&itask->itask_mutex); return (STMF_SUCCESS); } ! if (itask->itask_flags & ITASK_BEING_ABORTED) { ! mutex_exit(&itask->itask_mutex); return (STMF_ABORTED); + } + mutex_exit(&itask->itask_mutex); if (task->task_additional_flags & TASK_AF_NO_EXPECTED_XFER_LENGTH) { task->task_status_ctrl = 0; task->task_resid = 0; } else if (task->task_cmd_xfer_length >
*** 4788,4807 **** stmf_send_status_done(scsi_task_t *task, stmf_status_t s, uint32_t iof) { stmf_i_scsi_task_t *itask = (stmf_i_scsi_task_t *)task->task_stmf_private; stmf_worker_t *w = itask->itask_worker; ! uint32_t new, old; uint8_t free_it, queue_it; stmf_task_audit(itask, TE_SEND_STATUS_DONE, iof, NULL); mutex_enter(&w->worker_lock); ! do { ! new = old = itask->itask_flags; ! if (old & ITASK_BEING_ABORTED) { mutex_exit(&w->worker_lock); return; } free_it = 0; if (iof & STMF_IOF_LPORT_DONE) { new &= ~ITASK_KNOWN_TO_TGT_PORT; --- 5174,5194 ---- stmf_send_status_done(scsi_task_t *task, stmf_status_t s, uint32_t iof) { stmf_i_scsi_task_t *itask = (stmf_i_scsi_task_t *)task->task_stmf_private; stmf_worker_t *w = itask->itask_worker; ! uint32_t new; uint8_t free_it, queue_it; stmf_task_audit(itask, TE_SEND_STATUS_DONE, iof, NULL); + mutex_enter(&itask->itask_mutex); mutex_enter(&w->worker_lock); ! new = itask->itask_flags; ! if (itask->itask_flags & ITASK_BEING_ABORTED) { mutex_exit(&w->worker_lock); + mutex_exit(&itask->itask_mutex); return; } free_it = 0; if (iof & STMF_IOF_LPORT_DONE) { new &= ~ITASK_KNOWN_TO_TGT_PORT;
*** 4813,4961 **** * just update the buffer information by grabbing the * worker lock. If the task is not known to LU, * completed/aborted, then see if we need to * free this task. */ ! if (old & ITASK_KNOWN_TO_LU) { free_it = 0; queue_it = 1; ! if (old & ITASK_IN_WORKER_QUEUE) { cmn_err(CE_PANIC, "status completion received" " when task is already in worker queue " " task = %p", (void *)task); } - new |= ITASK_IN_WORKER_QUEUE; } else { queue_it = 0; } ! } while (atomic_cas_32(&itask->itask_flags, old, new) != old); task->task_completion_status = s; - if (queue_it) { ASSERT(itask->itask_ncmds < ITASK_MAX_NCMDS); itask->itask_cmd_stack[itask->itask_ncmds++] = ITASK_CMD_STATUS_DONE; ! itask->itask_worker_next = NULL; ! if (w->worker_task_tail) { ! w->worker_task_tail->itask_worker_next = itask; ! } else { ! w->worker_task_head = itask; } ! w->worker_task_tail = itask; ! /* Measure task waitq time */ ! itask->itask_waitq_enter_timestamp = gethrtime(); ! if (++(w->worker_queue_depth) > w->worker_max_qdepth_pu) { ! w->worker_max_qdepth_pu = w->worker_queue_depth; ! } ! if ((w->worker_flags & STMF_WORKER_ACTIVE) == 0) ! cv_signal(&w->worker_cv); ! } mutex_exit(&w->worker_lock); if (free_it) { if ((itask->itask_flags & (ITASK_KNOWN_TO_LU | ITASK_KNOWN_TO_TGT_PORT | ITASK_IN_WORKER_QUEUE | ITASK_BEING_ABORTED)) == 0) { stmf_task_free(task); } else { cmn_err(CE_PANIC, "LU is done with the task but LPORT " " is not done, itask %p itask_flags %x", (void *)itask, itask->itask_flags); } } } void stmf_task_lu_done(scsi_task_t *task) { stmf_i_scsi_task_t *itask = (stmf_i_scsi_task_t *)task->task_stmf_private; stmf_worker_t *w = itask->itask_worker; - uint32_t new, old; mutex_enter(&w->worker_lock); ! do { ! new = old = itask->itask_flags; ! if (old & ITASK_BEING_ABORTED) { mutex_exit(&w->worker_lock); return; } ! if (old & ITASK_IN_WORKER_QUEUE) { cmn_err(CE_PANIC, "task_lu_done received" " when task is in worker queue " " task = %p", (void *)task); } ! new &= ~ITASK_KNOWN_TO_LU; ! } while (atomic_cas_32(&itask->itask_flags, old, new) != old); mutex_exit(&w->worker_lock); - if ((itask->itask_flags & (ITASK_KNOWN_TO_LU | ITASK_KNOWN_TO_TGT_PORT | ITASK_IN_WORKER_QUEUE | ITASK_BEING_ABORTED)) == 0) { stmf_task_free(task); } else { cmn_err(CE_PANIC, "stmf_lu_done should be the last stage but " " the task is still not done, task = %p", (void *)task); } } void stmf_queue_task_for_abort(scsi_task_t *task, stmf_status_t s) { stmf_i_scsi_task_t *itask = (stmf_i_scsi_task_t *)task->task_stmf_private; stmf_worker_t *w; - uint32_t old, new; stmf_task_audit(itask, TE_TASK_ABORT, CMD_OR_IOF_NA, NULL); ! do { ! old = new = itask->itask_flags; ! if ((old & ITASK_BEING_ABORTED) || ! ((old & (ITASK_KNOWN_TO_TGT_PORT | ITASK_KNOWN_TO_LU)) == 0)) { return; } ! new |= ITASK_BEING_ABORTED; ! } while (atomic_cas_32(&itask->itask_flags, old, new) != old); task->task_completion_status = s; - itask->itask_start_time = ddi_get_lbolt(); if (((w = itask->itask_worker) == NULL) || (itask->itask_flags & ITASK_IN_TRANSITION)) { return; } /* Queue it and get out */ - mutex_enter(&w->worker_lock); if (itask->itask_flags & ITASK_IN_WORKER_QUEUE) { ! mutex_exit(&w->worker_lock); return; } ! atomic_or_32(&itask->itask_flags, ITASK_IN_WORKER_QUEUE); ! itask->itask_worker_next = NULL; ! if (w->worker_task_tail) { ! w->worker_task_tail->itask_worker_next = itask; ! } else { ! w->worker_task_head = itask; ! } ! w->worker_task_tail = itask; ! if (++(w->worker_queue_depth) > w->worker_max_qdepth_pu) { ! w->worker_max_qdepth_pu = w->worker_queue_depth; ! } ! if ((w->worker_flags & STMF_WORKER_ACTIVE) == 0) ! cv_signal(&w->worker_cv); mutex_exit(&w->worker_lock); } void stmf_abort(int abort_cmd, scsi_task_t *task, stmf_status_t s, void *arg) { stmf_i_scsi_task_t *itask = NULL; ! uint32_t old, new, f, rf; DTRACE_PROBE2(scsi__task__abort, scsi_task_t *, task, stmf_status_t, s); switch (abort_cmd) { --- 5200,5326 ---- * just update the buffer information by grabbing the * worker lock. If the task is not known to LU, * completed/aborted, then see if we need to * free this task. */ ! if (itask->itask_flags & ITASK_KNOWN_TO_LU) { free_it = 0; queue_it = 1; ! if (itask->itask_flags & ITASK_IN_WORKER_QUEUE) { cmn_err(CE_PANIC, "status completion received" " when task is already in worker queue " " task = %p", (void *)task); } } else { queue_it = 0; } ! itask->itask_flags = new; task->task_completion_status = s; if (queue_it) { ASSERT(itask->itask_ncmds < ITASK_MAX_NCMDS); itask->itask_cmd_stack[itask->itask_ncmds++] = ITASK_CMD_STATUS_DONE; ! ! STMF_ENQUEUE_ITASK(w, itask); ! mutex_exit(&w->worker_lock); ! mutex_exit(&itask->itask_mutex); ! return; } ! mutex_exit(&w->worker_lock); if (free_it) { if ((itask->itask_flags & (ITASK_KNOWN_TO_LU | ITASK_KNOWN_TO_TGT_PORT | ITASK_IN_WORKER_QUEUE | ITASK_BEING_ABORTED)) == 0) { stmf_task_free(task); + return; } else { cmn_err(CE_PANIC, "LU is done with the task but LPORT " " is not done, itask %p itask_flags %x", (void *)itask, itask->itask_flags); } } + mutex_exit(&itask->itask_mutex); } void stmf_task_lu_done(scsi_task_t *task) { stmf_i_scsi_task_t *itask = (stmf_i_scsi_task_t *)task->task_stmf_private; stmf_worker_t *w = itask->itask_worker; + mutex_enter(&itask->itask_mutex); mutex_enter(&w->worker_lock); ! if (itask->itask_flags & ITASK_BEING_ABORTED) { mutex_exit(&w->worker_lock); + mutex_exit(&itask->itask_mutex); return; } ! if (itask->itask_flags & ITASK_IN_WORKER_QUEUE) { cmn_err(CE_PANIC, "task_lu_done received" " when task is in worker queue " " task = %p", (void *)task); } ! itask->itask_flags &= ~ITASK_KNOWN_TO_LU; mutex_exit(&w->worker_lock); if ((itask->itask_flags & (ITASK_KNOWN_TO_LU | ITASK_KNOWN_TO_TGT_PORT | ITASK_IN_WORKER_QUEUE | ITASK_BEING_ABORTED)) == 0) { stmf_task_free(task); + return; } else { cmn_err(CE_PANIC, "stmf_lu_done should be the last stage but " " the task is still not done, task = %p", (void *)task); } + mutex_exit(&itask->itask_mutex); } void stmf_queue_task_for_abort(scsi_task_t *task, stmf_status_t s) { stmf_i_scsi_task_t *itask = (stmf_i_scsi_task_t *)task->task_stmf_private; stmf_worker_t *w; stmf_task_audit(itask, TE_TASK_ABORT, CMD_OR_IOF_NA, NULL); ! mutex_enter(&itask->itask_mutex); ! if ((itask->itask_flags & ITASK_BEING_ABORTED) || ! ((itask->itask_flags & (ITASK_KNOWN_TO_TGT_PORT | ITASK_KNOWN_TO_LU)) == 0)) { + mutex_exit(&itask->itask_mutex); return; } ! itask->itask_flags |= ITASK_BEING_ABORTED; task->task_completion_status = s; if (((w = itask->itask_worker) == NULL) || (itask->itask_flags & ITASK_IN_TRANSITION)) { + mutex_exit(&itask->itask_mutex); return; } /* Queue it and get out */ if (itask->itask_flags & ITASK_IN_WORKER_QUEUE) { ! mutex_exit(&itask->itask_mutex); return; } ! mutex_enter(&w->worker_lock); ! STMF_ENQUEUE_ITASK(w, itask); mutex_exit(&w->worker_lock); + mutex_exit(&itask->itask_mutex); } void stmf_abort(int abort_cmd, scsi_task_t *task, stmf_status_t s, void *arg) { stmf_i_scsi_task_t *itask = NULL; ! uint32_t f, rf; DTRACE_PROBE2(scsi__task__abort, scsi_task_t *, task, stmf_status_t, s); switch (abort_cmd) {
*** 4974,5003 **** f = ITASK_KNOWN_TO_LU; break; default: return; } itask = (stmf_i_scsi_task_t *)task->task_stmf_private; f |= ITASK_BEING_ABORTED | rf; ! do { ! old = new = itask->itask_flags; ! if ((old & f) != f) { return; } ! new &= ~rf; ! } while (atomic_cas_32(&itask->itask_flags, old, new) != old); } void stmf_task_lu_aborted(scsi_task_t *task, stmf_status_t s, uint32_t iof) { char info[STMF_CHANGE_INFO_LEN]; stmf_i_scsi_task_t *itask = TASK_TO_ITASK(task); unsigned long long st; stmf_task_audit(itask, TE_TASK_LU_ABORTED, iof, NULL); ! st = s; /* gcc fix */ if ((s != STMF_ABORT_SUCCESS) && (s != STMF_NOT_FOUND)) { (void) snprintf(info, sizeof (info), "task %p, lu failed to abort ret=%llx", (void *)task, st); } else if ((iof & STMF_IOF_LU_DONE) == 0) { --- 5339,5375 ---- f = ITASK_KNOWN_TO_LU; break; default: return; } + itask = (stmf_i_scsi_task_t *)task->task_stmf_private; + mutex_enter(&itask->itask_mutex); f |= ITASK_BEING_ABORTED | rf; ! ! if ((itask->itask_flags & f) != f) { ! mutex_exit(&itask->itask_mutex); return; } ! itask->itask_flags &= ~rf; ! mutex_exit(&itask->itask_mutex); ! } + /* + * NOTE: stmf_abort_task_offline will release and then reacquire the + * itask_mutex. This is required to prevent a lock order violation. + */ void stmf_task_lu_aborted(scsi_task_t *task, stmf_status_t s, uint32_t iof) { char info[STMF_CHANGE_INFO_LEN]; stmf_i_scsi_task_t *itask = TASK_TO_ITASK(task); unsigned long long st; stmf_task_audit(itask, TE_TASK_LU_ABORTED, iof, NULL); ! ASSERT(mutex_owned(&itask->itask_mutex)); st = s; /* gcc fix */ if ((s != STMF_ABORT_SUCCESS) && (s != STMF_NOT_FOUND)) { (void) snprintf(info, sizeof (info), "task %p, lu failed to abort ret=%llx", (void *)task, st); } else if ((iof & STMF_IOF_LU_DONE) == 0) {
*** 5013,5032 **** } stmf_abort_task_offline(task, 1, info); } void stmf_task_lport_aborted(scsi_task_t *task, stmf_status_t s, uint32_t iof) { char info[STMF_CHANGE_INFO_LEN]; stmf_i_scsi_task_t *itask = TASK_TO_ITASK(task); unsigned long long st; - uint32_t old, new; stmf_task_audit(itask, TE_TASK_LPORT_ABORTED, iof, NULL); - st = s; if ((s != STMF_ABORT_SUCCESS) && (s != STMF_NOT_FOUND)) { (void) snprintf(info, sizeof (info), "task %p, tgt port failed to abort ret=%llx", (void *)task, st); --- 5385,5407 ---- } stmf_abort_task_offline(task, 1, info); } + /* + * NOTE: stmf_abort_task_offline will release and then reacquire the + * itask_mutex. This is required to prevent a lock order violation. + */ void stmf_task_lport_aborted(scsi_task_t *task, stmf_status_t s, uint32_t iof) { char info[STMF_CHANGE_INFO_LEN]; stmf_i_scsi_task_t *itask = TASK_TO_ITASK(task); unsigned long long st; + ASSERT(mutex_owned(&itask->itask_mutex)); stmf_task_audit(itask, TE_TASK_LPORT_ABORTED, iof, NULL); st = s; if ((s != STMF_ABORT_SUCCESS) && (s != STMF_NOT_FOUND)) { (void) snprintf(info, sizeof (info), "task %p, tgt port failed to abort ret=%llx", (void *)task, st);
*** 5036,5074 **** "task=%p, s=%llx, iof=%x", (void *)task, st, iof); } else { /* * LPORT abort successfully */ ! do { ! old = new = itask->itask_flags; ! if (!(old & ITASK_KNOWN_TO_TGT_PORT)) return; - new &= ~ITASK_KNOWN_TO_TGT_PORT; - } while (atomic_cas_32(&itask->itask_flags, old, new) != old); - return; } stmf_abort_task_offline(task, 0, info); } stmf_status_t stmf_task_poll_lu(scsi_task_t *task, uint32_t timeout) { stmf_i_scsi_task_t *itask = (stmf_i_scsi_task_t *) task->task_stmf_private; stmf_worker_t *w = itask->itask_worker; int i; ASSERT(itask->itask_flags & ITASK_KNOWN_TO_LU); mutex_enter(&w->worker_lock); if (itask->itask_ncmds >= ITASK_MAX_NCMDS) { mutex_exit(&w->worker_lock); return (STMF_BUSY); } for (i = 0; i < itask->itask_ncmds; i++) { if (itask->itask_cmd_stack[i] == ITASK_CMD_POLL_LU) { mutex_exit(&w->worker_lock); return (STMF_SUCCESS); } } itask->itask_cmd_stack[itask->itask_ncmds++] = ITASK_CMD_POLL_LU; if (timeout == ITASK_DEFAULT_POLL_TIMEOUT) { --- 5411,5458 ---- "task=%p, s=%llx, iof=%x", (void *)task, st, iof); } else { /* * LPORT abort successfully */ ! atomic_and_32(&itask->itask_flags, ~ITASK_KNOWN_TO_TGT_PORT); return; } stmf_abort_task_offline(task, 0, info); } + void + stmf_task_lport_aborted_unlocked(scsi_task_t *task, stmf_status_t s, + uint32_t iof) + { + stmf_i_scsi_task_t *itask = TASK_TO_ITASK(task); + + mutex_enter(&itask->itask_mutex); + stmf_task_lport_aborted(task, s, iof); + mutex_exit(&itask->itask_mutex); + } + stmf_status_t stmf_task_poll_lu(scsi_task_t *task, uint32_t timeout) { stmf_i_scsi_task_t *itask = (stmf_i_scsi_task_t *) task->task_stmf_private; stmf_worker_t *w = itask->itask_worker; int i; + mutex_enter(&itask->itask_mutex); ASSERT(itask->itask_flags & ITASK_KNOWN_TO_LU); mutex_enter(&w->worker_lock); if (itask->itask_ncmds >= ITASK_MAX_NCMDS) { mutex_exit(&w->worker_lock); + mutex_exit(&itask->itask_mutex); return (STMF_BUSY); } for (i = 0; i < itask->itask_ncmds; i++) { if (itask->itask_cmd_stack[i] == ITASK_CMD_POLL_LU) { mutex_exit(&w->worker_lock); + mutex_exit(&itask->itask_mutex); return (STMF_SUCCESS); } } itask->itask_cmd_stack[itask->itask_ncmds++] = ITASK_CMD_POLL_LU; if (timeout == ITASK_DEFAULT_POLL_TIMEOUT) {
*** 5078,5102 **** if (t == 0) t = 1; itask->itask_poll_timeout = ddi_get_lbolt() + t; } if ((itask->itask_flags & ITASK_IN_WORKER_QUEUE) == 0) { ! itask->itask_worker_next = NULL; ! if (w->worker_task_tail) { ! w->worker_task_tail->itask_worker_next = itask; ! } else { ! w->worker_task_head = itask; } - w->worker_task_tail = itask; - if (++(w->worker_queue_depth) > w->worker_max_qdepth_pu) { - w->worker_max_qdepth_pu = w->worker_queue_depth; - } - atomic_or_32(&itask->itask_flags, ITASK_IN_WORKER_QUEUE); - if ((w->worker_flags & STMF_WORKER_ACTIVE) == 0) - cv_signal(&w->worker_cv); - } mutex_exit(&w->worker_lock); return (STMF_SUCCESS); } stmf_status_t stmf_task_poll_lport(scsi_task_t *task, uint32_t timeout) --- 5462,5475 ---- if (t == 0) t = 1; itask->itask_poll_timeout = ddi_get_lbolt() + t; } if ((itask->itask_flags & ITASK_IN_WORKER_QUEUE) == 0) { ! STMF_ENQUEUE_ITASK(w, itask); } mutex_exit(&w->worker_lock); + mutex_exit(&itask->itask_mutex); return (STMF_SUCCESS); } stmf_status_t stmf_task_poll_lport(scsi_task_t *task, uint32_t timeout)
*** 5104,5122 **** --- 5477,5498 ---- stmf_i_scsi_task_t *itask = (stmf_i_scsi_task_t *) task->task_stmf_private; stmf_worker_t *w = itask->itask_worker; int i; + mutex_enter(&itask->itask_mutex); ASSERT(itask->itask_flags & ITASK_KNOWN_TO_TGT_PORT); mutex_enter(&w->worker_lock); if (itask->itask_ncmds >= ITASK_MAX_NCMDS) { mutex_exit(&w->worker_lock); + mutex_exit(&itask->itask_mutex); return (STMF_BUSY); } for (i = 0; i < itask->itask_ncmds; i++) { if (itask->itask_cmd_stack[i] == ITASK_CMD_POLL_LPORT) { mutex_exit(&w->worker_lock); + mutex_exit(&itask->itask_mutex); return (STMF_SUCCESS); } } itask->itask_cmd_stack[itask->itask_ncmds++] = ITASK_CMD_POLL_LPORT; if (timeout == ITASK_DEFAULT_POLL_TIMEOUT) {
*** 5126,5149 **** if (t == 0) t = 1; itask->itask_poll_timeout = ddi_get_lbolt() + t; } if ((itask->itask_flags & ITASK_IN_WORKER_QUEUE) == 0) { ! itask->itask_worker_next = NULL; ! if (w->worker_task_tail) { ! w->worker_task_tail->itask_worker_next = itask; ! } else { ! w->worker_task_head = itask; } - w->worker_task_tail = itask; - if (++(w->worker_queue_depth) > w->worker_max_qdepth_pu) { - w->worker_max_qdepth_pu = w->worker_queue_depth; - } - if ((w->worker_flags & STMF_WORKER_ACTIVE) == 0) - cv_signal(&w->worker_cv); - } mutex_exit(&w->worker_lock); return (STMF_SUCCESS); } void stmf_do_task_abort(scsi_task_t *task) --- 5502,5515 ---- if (t == 0) t = 1; itask->itask_poll_timeout = ddi_get_lbolt() + t; } if ((itask->itask_flags & ITASK_IN_WORKER_QUEUE) == 0) { ! STMF_ENQUEUE_ITASK(w, itask); } mutex_exit(&w->worker_lock); + mutex_exit(&itask->itask_mutex); return (STMF_SUCCESS); } void stmf_do_task_abort(scsi_task_t *task)
*** 5150,5175 **** { stmf_i_scsi_task_t *itask = TASK_TO_ITASK(task); stmf_lu_t *lu; stmf_local_port_t *lport; unsigned long long ret; ! uint32_t old, new; uint8_t call_lu_abort, call_port_abort; char info[STMF_CHANGE_INFO_LEN]; lu = task->task_lu; lport = task->task_lport; ! do { ! old = new = itask->itask_flags; ! if ((old & (ITASK_KNOWN_TO_LU | ITASK_LU_ABORT_CALLED)) == ! ITASK_KNOWN_TO_LU) { new |= ITASK_LU_ABORT_CALLED; call_lu_abort = 1; } else { call_lu_abort = 0; } ! } while (atomic_cas_32(&itask->itask_flags, old, new) != old); if (call_lu_abort) { if ((itask->itask_flags & ITASK_DEFAULT_HANDLING) == 0) { ret = lu->lu_abort(lu, STMF_LU_ABORT_TASK, task, 0); } else { --- 5516,5541 ---- { stmf_i_scsi_task_t *itask = TASK_TO_ITASK(task); stmf_lu_t *lu; stmf_local_port_t *lport; unsigned long long ret; ! uint32_t new = 0; uint8_t call_lu_abort, call_port_abort; char info[STMF_CHANGE_INFO_LEN]; lu = task->task_lu; lport = task->task_lport; ! mutex_enter(&itask->itask_mutex); ! new = itask->itask_flags; ! if ((itask->itask_flags & (ITASK_KNOWN_TO_LU | ! ITASK_LU_ABORT_CALLED)) == ITASK_KNOWN_TO_LU) { new |= ITASK_LU_ABORT_CALLED; call_lu_abort = 1; } else { call_lu_abort = 0; } ! itask->itask_flags = new; if (call_lu_abort) { if ((itask->itask_flags & ITASK_DEFAULT_HANDLING) == 0) { ret = lu->lu_abort(lu, STMF_LU_ABORT_TASK, task, 0); } else {
*** 5193,5212 **** "lu abort timed out"); stmf_abort_task_offline(itask->itask_task, 1, info); } } ! do { ! old = new = itask->itask_flags; ! if ((old & (ITASK_KNOWN_TO_TGT_PORT | ITASK_TGT_PORT_ABORT_CALLED)) == ITASK_KNOWN_TO_TGT_PORT) { new |= ITASK_TGT_PORT_ABORT_CALLED; call_port_abort = 1; } else { call_port_abort = 0; } ! } while (atomic_cas_32(&itask->itask_flags, old, new) != old); if (call_port_abort) { ret = lport->lport_abort(lport, STMF_LPORT_ABORT_TASK, task, 0); if ((ret == STMF_ABORT_SUCCESS) || (ret == STMF_NOT_FOUND)) { stmf_task_lport_aborted(task, ret, STMF_IOF_LPORT_DONE); } else if (ret == STMF_BUSY) { --- 5559,5584 ---- "lu abort timed out"); stmf_abort_task_offline(itask->itask_task, 1, info); } } ! /* ! * NOTE: After the call to either stmf_abort_task_offline() or ! * stmf_task_lu_abort() the itask_mutex was dropped and reacquired ! * to avoid a deadlock situation with stmf_state.stmf_lock. ! */ ! ! new = itask->itask_flags; ! if ((itask->itask_flags & (ITASK_KNOWN_TO_TGT_PORT | ITASK_TGT_PORT_ABORT_CALLED)) == ITASK_KNOWN_TO_TGT_PORT) { new |= ITASK_TGT_PORT_ABORT_CALLED; call_port_abort = 1; } else { call_port_abort = 0; } ! itask->itask_flags = new; ! if (call_port_abort) { ret = lport->lport_abort(lport, STMF_LPORT_ABORT_TASK, task, 0); if ((ret == STMF_ABORT_SUCCESS) || (ret == STMF_NOT_FOUND)) { stmf_task_lport_aborted(task, ret, STMF_IOF_LPORT_DONE); } else if (ret == STMF_BUSY) {
*** 5226,5235 **** --- 5598,5608 ---- (void) snprintf(info, sizeof (info), "lport abort timed out"); stmf_abort_task_offline(itask->itask_task, 0, info); } } + mutex_exit(&itask->itask_mutex); } stmf_status_t stmf_ctl(int cmd, void *obj, void *arg) {
*** 5557,5583 **** uint8_t *p; uint32_t sz, asz, nports = 0, nports_standby = 0; mutex_enter(&stmf_state.stmf_lock); /* check if any ports are standby and create second group */ ! for (ilport = stmf_state.stmf_ilportlist; ilport; ilport = ilport->ilport_next) { if (ilport->ilport_standby == 1) { nports_standby++; } else { nports++; } } ! /* The spec only allows for 255 ports to be reported per group */ nports = min(nports, 255); nports_standby = min(nports_standby, 255); sz = (nports * 4) + 12; ! if (nports_standby && ilu_alua) { sz += (nports_standby * 4) + 8; } ! asz = sz + sizeof (*xd) - 4; xd = (stmf_xfer_data_t *)kmem_zalloc(asz, KM_NOSLEEP); if (xd == NULL) { mutex_exit(&stmf_state.stmf_lock); return (NULL); } --- 5930,5980 ---- uint8_t *p; uint32_t sz, asz, nports = 0, nports_standby = 0; mutex_enter(&stmf_state.stmf_lock); /* check if any ports are standby and create second group */ ! for (ilport = stmf_state.stmf_ilportlist; ilport != NULL; ilport = ilport->ilport_next) { if (ilport->ilport_standby == 1) { nports_standby++; } else { nports++; } } ! /* ! * Section 6.25 REPORT TARGET PORT GROUPS ! * The reply can contain many group replies. Each group is limited ! * to 255 port identifiers so we'll need to limit the amount of ! * data returned. For FC ports there's a physical limitation in ! * machines that make reaching 255 ports very, very unlikely. For ! * iSCSI on the other hand recent changes mean the port count could ! * be as high as 4096 (current limit). Limiting the data returned ! * for iSCSI isn't as bad as it sounds. This information is only ! * important for ALUA, which isn't supported for iSCSI. iSCSI uses ! * virtual IP addresses to deal with node fail over in a cluster. ! */ nports = min(nports, 255); nports_standby = min(nports_standby, 255); + + /* + * The first 4 bytes of the returned data is the length. The + * size of the Target Port Group header is 8 bytes. So, that's where + * the 12 comes from. Each port entry is 4 bytes in size. + */ sz = (nports * 4) + 12; ! if (nports_standby != 0 && ilu_alua != 0) { ! /* --- Only add 8 bytes since it's just the Group header ----*/ sz += (nports_standby * 4) + 8; } ! ! /* ! * The stmf_xfer_data structure contains 4 bytes that will be ! * part of the data buffer. So, subtract the 4 bytes from the space ! * needed. ! */ ! asz = sizeof (*xd) + sz - 4; xd = (stmf_xfer_data_t *)kmem_zalloc(asz, KM_NOSLEEP); if (xd == NULL) { mutex_exit(&stmf_state.stmf_lock); return (NULL); }
*** 5584,5629 **** xd->alloc_size = asz; xd->size_left = sz; p = xd->buf; *((uint32_t *)p) = BE_32(sz - 4); p += 4; p[0] = 0x80; /* PREF */ p[1] = 5; /* AO_SUP, S_SUP */ if (stmf_state.stmf_alua_node == 1) { p[3] = 1; /* Group 1 */ } else { p[3] = 0; /* Group 0 */ } p[7] = nports & 0xff; p += 8; ! for (ilport = stmf_state.stmf_ilportlist; ilport; ilport = ilport->ilport_next) { if (ilport->ilport_standby == 1) { continue; } ((uint16_t *)p)[1] = BE_16(ilport->ilport_rtpid); p += 4; } ! if (nports_standby && ilu_alua) { p[0] = 0x02; /* Non PREF, Standby */ p[1] = 5; /* AO_SUP, S_SUP */ if (stmf_state.stmf_alua_node == 1) { p[3] = 0; /* Group 0 */ } else { p[3] = 1; /* Group 1 */ } p[7] = nports_standby & 0xff; p += 8; ! for (ilport = stmf_state.stmf_ilportlist; ilport; ! ilport = ilport->ilport_next) { if (ilport->ilport_standby == 0) { continue; } ((uint16_t *)p)[1] = BE_16(ilport->ilport_rtpid); p += 4; } } mutex_exit(&stmf_state.stmf_lock); --- 5981,6031 ---- xd->alloc_size = asz; xd->size_left = sz; p = xd->buf; + /* ---- length values never include the field that holds the size ----*/ *((uint32_t *)p) = BE_32(sz - 4); p += 4; + + /* ---- Now fill out the first Target Group header ---- */ p[0] = 0x80; /* PREF */ p[1] = 5; /* AO_SUP, S_SUP */ if (stmf_state.stmf_alua_node == 1) { p[3] = 1; /* Group 1 */ } else { p[3] = 0; /* Group 0 */ } p[7] = nports & 0xff; p += 8; ! for (ilport = stmf_state.stmf_ilportlist; ilport != NULL && nports != 0; ilport = ilport->ilport_next) { if (ilport->ilport_standby == 1) { continue; } ((uint16_t *)p)[1] = BE_16(ilport->ilport_rtpid); p += 4; + nports--; } ! if (nports_standby != 0 && ilu_alua != 0) { p[0] = 0x02; /* Non PREF, Standby */ p[1] = 5; /* AO_SUP, S_SUP */ if (stmf_state.stmf_alua_node == 1) { p[3] = 0; /* Group 0 */ } else { p[3] = 1; /* Group 1 */ } p[7] = nports_standby & 0xff; p += 8; ! for (ilport = stmf_state.stmf_ilportlist; ilport != NULL && ! nports_standby != 0; ilport = ilport->ilport_next) { if (ilport->ilport_standby == 0) { continue; } ((uint16_t *)p)[1] = BE_16(ilport->ilport_rtpid); p += 4; + nports_standby--; } } mutex_exit(&stmf_state.stmf_lock);
*** 5860,5870 **** --- 6262,6274 ---- stmf_i_lu_t *ilu = (stmf_i_lu_t *)task->task_lu->lu_stmf_private; stmf_xfer_data_t *xd; uint32_t sz, minsz; + mutex_enter(&itask->itask_mutex); itask->itask_flags |= ITASK_DEFAULT_HANDLING; + task->task_cmd_xfer_length = ((((uint32_t)task->task_cdb[6]) << 24) | (((uint32_t)task->task_cdb[7]) << 16) | (((uint32_t)task->task_cdb[8]) << 8) | ((uint32_t)task->task_cdb[9]));
*** 5872,5881 **** --- 6276,6286 ---- if (task->task_additional_flags & TASK_AF_NO_EXPECTED_XFER_LENGTH) { task->task_expected_xfer_length = task->task_cmd_xfer_length; } + mutex_exit(&itask->itask_mutex); if (task->task_cmd_xfer_length == 0) { stmf_scsilib_send_status(task, STATUS_GOOD, 0); return; }
*** 5977,5987 **** --- 6382,6394 ---- /* * Mark this task as the one causing LU reset so that we know who * was responsible for setting the ILU_RESET_ACTIVE. In case this * task itself gets aborted, we will clear ILU_RESET_ACTIVE. */ + mutex_enter(&itask->itask_mutex); itask->itask_flags |= ITASK_DEFAULT_HANDLING | ITASK_CAUSING_LU_RESET; + mutex_exit(&itask->itask_mutex); /* Initiatiate abort on all commands on this LU except this one */ stmf_abort(STMF_QUEUE_ABORT_LU, task, STMF_ABORTED, task->task_lu); /* Start polling on this task */
*** 6053,6064 **** --- 6460,6473 ---- stmf_scsilib_send_status(task, STATUS_GOOD, 0); return; } /* ok, start the damage */ + mutex_enter(&itask->itask_mutex); itask->itask_flags |= ITASK_DEFAULT_HANDLING | ITASK_CAUSING_TARGET_RESET; + mutex_exit(&itask->itask_mutex); for (i = 0; i < lm->lm_nentries; i++) { if (lm->lm_plus[i] == NULL) continue; lm_ent = (stmf_lun_map_ent_t *)lm->lm_plus[i]; ilu = (stmf_i_lu_t *)(lm_ent->ent_lu->lu_stmf_private);
*** 6112,6149 **** void stmf_worker_init() { uint32_t i; /* Make local copy of global tunables */ - stmf_i_max_nworkers = stmf_max_nworkers; - stmf_i_min_nworkers = stmf_min_nworkers; ASSERT(stmf_workers == NULL); ! if (stmf_i_min_nworkers < 4) { ! stmf_i_min_nworkers = 4; } ! if (stmf_i_max_nworkers < stmf_i_min_nworkers) { ! stmf_i_max_nworkers = stmf_i_min_nworkers; ! } stmf_workers = (stmf_worker_t *)kmem_zalloc( ! sizeof (stmf_worker_t) * stmf_i_max_nworkers, KM_SLEEP); ! for (i = 0; i < stmf_i_max_nworkers; i++) { stmf_worker_t *w = &stmf_workers[i]; mutex_init(&w->worker_lock, NULL, MUTEX_DRIVER, NULL); cv_init(&w->worker_cv, NULL, CV_DRIVER, NULL); } - stmf_worker_mgmt_delay = drv_usectohz(20 * 1000); stmf_workers_state = STMF_WORKERS_ENABLED; ! /* Workers will be started by stmf_worker_mgmt() */ /* Lets wait for atleast one worker to start */ while (stmf_nworkers_cur == 0) delay(drv_usectohz(20 * 1000)); - stmf_worker_mgmt_delay = drv_usectohz(3 * 1000 * 1000); } stmf_status_t stmf_worker_fini() { --- 6521,6570 ---- void stmf_worker_init() { uint32_t i; + stmf_worker_t *w; /* Make local copy of global tunables */ + /* + * Allow workers to be scaled down to a very low number for cases + * where the load is light. If the number of threads gets below + * 4 assume it is a mistake and force the threads back to a + * reasonable number. The low limit of 4 is simply legacy and + * may be too low. + */ ASSERT(stmf_workers == NULL); ! if (stmf_nworkers < 4) { ! stmf_nworkers = 64; } ! stmf_workers = (stmf_worker_t *)kmem_zalloc( ! sizeof (stmf_worker_t) * stmf_nworkers, KM_SLEEP); ! for (i = 0; i < stmf_nworkers; i++) { stmf_worker_t *w = &stmf_workers[i]; mutex_init(&w->worker_lock, NULL, MUTEX_DRIVER, NULL); cv_init(&w->worker_cv, NULL, CV_DRIVER, NULL); } stmf_workers_state = STMF_WORKERS_ENABLED; ! /* Check if we are starting */ ! if (stmf_nworkers_cur < stmf_nworkers - 1) { ! for (i = stmf_nworkers_cur; i < stmf_nworkers; i++) { ! w = &stmf_workers[i]; ! w->worker_tid = thread_create(NULL, 0, stmf_worker_task, ! (void *)&stmf_workers[i], 0, &p0, TS_RUN, ! minclsyspri); ! stmf_nworkers_accepting_cmds++; ! } ! return; ! } /* Lets wait for atleast one worker to start */ while (stmf_nworkers_cur == 0) delay(drv_usectohz(20 * 1000)); } stmf_status_t stmf_worker_fini() {
*** 6152,6162 **** if (stmf_workers_state == STMF_WORKERS_DISABLED) return (STMF_SUCCESS); ASSERT(stmf_workers); stmf_workers_state = STMF_WORKERS_DISABLED; - stmf_worker_mgmt_delay = drv_usectohz(20 * 1000); cv_signal(&stmf_state.stmf_cv); sb = ddi_get_lbolt() + drv_usectohz(10 * 1000 * 1000); /* Wait for all the threads to die */ while (stmf_nworkers_cur != 0) { --- 6573,6582 ----
*** 6164,6179 **** stmf_workers_state = STMF_WORKERS_ENABLED; return (STMF_BUSY); } delay(drv_usectohz(100 * 1000)); } ! for (i = 0; i < stmf_i_max_nworkers; i++) { stmf_worker_t *w = &stmf_workers[i]; mutex_destroy(&w->worker_lock); cv_destroy(&w->worker_cv); } ! kmem_free(stmf_workers, sizeof (stmf_worker_t) * stmf_i_max_nworkers); stmf_workers = NULL; return (STMF_SUCCESS); } --- 6584,6599 ---- stmf_workers_state = STMF_WORKERS_ENABLED; return (STMF_BUSY); } delay(drv_usectohz(100 * 1000)); } ! for (i = 0; i < stmf_nworkers; i++) { stmf_worker_t *w = &stmf_workers[i]; mutex_destroy(&w->worker_lock); cv_destroy(&w->worker_cv); } ! kmem_free(stmf_workers, sizeof (stmf_worker_t) * stmf_nworkers); stmf_workers = NULL; return (STMF_SUCCESS); }
*** 6186,6196 **** stmf_i_scsi_task_t *itask; stmf_data_buf_t *dbuf; stmf_lu_t *lu; clock_t wait_timer = 0; clock_t wait_ticks, wait_delta = 0; - uint32_t old, new; uint8_t curcmd; uint8_t abort_free; uint8_t wait_queue; uint8_t dec_qdepth; --- 6606,6615 ----
*** 6198,6219 **** wait_ticks = drv_usectohz(10000); DTRACE_PROBE1(worker__create, stmf_worker_t, w); mutex_enter(&w->worker_lock); w->worker_flags |= STMF_WORKER_STARTED | STMF_WORKER_ACTIVE; ! stmf_worker_loop:; if ((w->worker_ref_count == 0) && (w->worker_flags & STMF_WORKER_TERMINATE)) { w->worker_flags &= ~(STMF_WORKER_STARTED | STMF_WORKER_ACTIVE | STMF_WORKER_TERMINATE); w->worker_tid = NULL; mutex_exit(&w->worker_lock); DTRACE_PROBE1(worker__destroy, stmf_worker_t, w); thread_exit(); } /* CONSTCOND */ while (1) { dec_qdepth = 0; if (wait_timer && (ddi_get_lbolt() >= wait_timer)) { wait_timer = 0; wait_delta = 0; if (w->worker_wait_head) { --- 6617,6643 ---- wait_ticks = drv_usectohz(10000); DTRACE_PROBE1(worker__create, stmf_worker_t, w); mutex_enter(&w->worker_lock); w->worker_flags |= STMF_WORKER_STARTED | STMF_WORKER_ACTIVE; ! atomic_inc_32(&stmf_nworkers_cur); ! ! stmf_worker_loop: if ((w->worker_ref_count == 0) && (w->worker_flags & STMF_WORKER_TERMINATE)) { w->worker_flags &= ~(STMF_WORKER_STARTED | STMF_WORKER_ACTIVE | STMF_WORKER_TERMINATE); w->worker_tid = NULL; mutex_exit(&w->worker_lock); DTRACE_PROBE1(worker__destroy, stmf_worker_t, w); + atomic_dec_32(&stmf_nworkers_cur); thread_exit(); } + /* CONSTCOND */ while (1) { + /* worker lock is held at this point */ dec_qdepth = 0; if (wait_timer && (ddi_get_lbolt() >= wait_timer)) { wait_timer = 0; wait_delta = 0; if (w->worker_wait_head) {
*** 6227,6272 **** w->worker_task_tail = w->worker_wait_tail; w->worker_wait_head = w->worker_wait_tail = NULL; } } ! if ((itask = w->worker_task_head) == NULL) { break; ! } task = itask->itask_task; DTRACE_PROBE2(worker__active, stmf_worker_t, w, scsi_task_t *, task); - w->worker_task_head = itask->itask_worker_next; - if (w->worker_task_head == NULL) - w->worker_task_tail = NULL; - wait_queue = 0; abort_free = 0; if (itask->itask_ncmds > 0) { curcmd = itask->itask_cmd_stack[itask->itask_ncmds - 1]; } else { ASSERT(itask->itask_flags & ITASK_BEING_ABORTED); } ! do { ! old = itask->itask_flags; ! if (old & ITASK_BEING_ABORTED) { itask->itask_ncmds = 1; curcmd = itask->itask_cmd_stack[0] = ITASK_CMD_ABORT; goto out_itask_flag_loop; ! } else if ((curcmd & ITASK_CMD_MASK) == ! ITASK_CMD_NEW_TASK) { /* * set ITASK_KSTAT_IN_RUNQ, this flag * will not reset until task completed */ ! new = old | ITASK_KNOWN_TO_LU | ITASK_KSTAT_IN_RUNQ; } else { goto out_itask_flag_loop; } - } while (atomic_cas_32(&itask->itask_flags, old, new) != old); out_itask_flag_loop: /* * Decide if this task needs to go to a queue and/or if --- 6651,6695 ---- w->worker_task_tail = w->worker_wait_tail; w->worker_wait_head = w->worker_wait_tail = NULL; } } ! ! STMF_DEQUEUE_ITASK(w, itask); ! if (itask == NULL) break; ! ! ASSERT((itask->itask_flags & ITASK_IN_FREE_LIST) == 0); task = itask->itask_task; DTRACE_PROBE2(worker__active, stmf_worker_t, w, scsi_task_t *, task); wait_queue = 0; abort_free = 0; + mutex_exit(&w->worker_lock); + mutex_enter(&itask->itask_mutex); + mutex_enter(&w->worker_lock); + if (itask->itask_ncmds > 0) { curcmd = itask->itask_cmd_stack[itask->itask_ncmds - 1]; } else { ASSERT(itask->itask_flags & ITASK_BEING_ABORTED); } ! if (itask->itask_flags & ITASK_BEING_ABORTED) { itask->itask_ncmds = 1; curcmd = itask->itask_cmd_stack[0] = ITASK_CMD_ABORT; goto out_itask_flag_loop; ! } else if ((curcmd & ITASK_CMD_MASK) == ITASK_CMD_NEW_TASK) { /* * set ITASK_KSTAT_IN_RUNQ, this flag * will not reset until task completed */ ! itask->itask_flags |= ITASK_KNOWN_TO_LU | ITASK_KSTAT_IN_RUNQ; } else { goto out_itask_flag_loop; } out_itask_flag_loop: /* * Decide if this task needs to go to a queue and/or if
*** 6321,6349 **** /* We made it here means we are going to call LU */ if ((itask->itask_flags & ITASK_DEFAULT_HANDLING) == 0) lu = task->task_lu; else lu = dlun0; dbuf = itask->itask_dbufs[ITASK_CMD_BUF_NDX(curcmd)]; mutex_exit(&w->worker_lock); curcmd &= ITASK_CMD_MASK; stmf_task_audit(itask, TE_PROCESS_CMD, curcmd, dbuf); switch (curcmd) { case ITASK_CMD_NEW_TASK: iss = (stmf_i_scsi_session_t *) task->task_session->ss_stmf_private; stmf_itl_lu_new_task(itask); if (iss->iss_flags & ISS_LUN_INVENTORY_CHANGED) { ! if (stmf_handle_cmd_during_ic(itask)) break; } #ifdef DEBUG if (stmf_drop_task_counter > 0) { ! if (atomic_dec_32_nv(&stmf_drop_task_counter) ! == 1) break; } #endif DTRACE_PROBE1(scsi__task__start, scsi_task_t *, task); lu->lu_new_task(task, dbuf); break; case ITASK_CMD_DATA_XFER_DONE: --- 6744,6777 ---- /* We made it here means we are going to call LU */ if ((itask->itask_flags & ITASK_DEFAULT_HANDLING) == 0) lu = task->task_lu; else lu = dlun0; + dbuf = itask->itask_dbufs[ITASK_CMD_BUF_NDX(curcmd)]; mutex_exit(&w->worker_lock); curcmd &= ITASK_CMD_MASK; stmf_task_audit(itask, TE_PROCESS_CMD, curcmd, dbuf); + mutex_exit(&itask->itask_mutex); + switch (curcmd) { case ITASK_CMD_NEW_TASK: iss = (stmf_i_scsi_session_t *) task->task_session->ss_stmf_private; stmf_itl_lu_new_task(itask); if (iss->iss_flags & ISS_LUN_INVENTORY_CHANGED) { ! if (stmf_handle_cmd_during_ic(itask)) { break; } + } #ifdef DEBUG if (stmf_drop_task_counter > 0) { ! if (atomic_dec_32_nv( ! (uint32_t *)&stmf_drop_task_counter) == 1) { break; } + } #endif DTRACE_PROBE1(scsi__task__start, scsi_task_t *, task); lu->lu_new_task(task, dbuf); break; case ITASK_CMD_DATA_XFER_DONE:
*** 6352,6361 **** --- 6780,6790 ---- case ITASK_CMD_STATUS_DONE: lu->lu_send_status_done(task); break; case ITASK_CMD_ABORT: if (abort_free) { + mutex_enter(&itask->itask_mutex); stmf_task_free(task); } else { stmf_do_task_abort(task); } break;
*** 6370,6379 **** --- 6799,6809 ---- break; case ITASK_CMD_SEND_STATUS: /* case ITASK_CMD_XFER_DATA: */ break; } + mutex_enter(&w->worker_lock); if (dec_qdepth) { w->worker_queue_depth--; } }
*** 6397,6546 **** DTRACE_PROBE1(worker__wakeup, stmf_worker_t, w); w->worker_flags |= STMF_WORKER_ACTIVE; goto stmf_worker_loop; } - void - stmf_worker_mgmt() - { - int i; - int workers_needed; - uint32_t qd; - clock_t tps, d = 0; - uint32_t cur_max_ntasks = 0; - stmf_worker_t *w; - - /* Check if we are trying to increase the # of threads */ - for (i = stmf_nworkers_cur; i < stmf_nworkers_needed; i++) { - if (stmf_workers[i].worker_flags & STMF_WORKER_STARTED) { - stmf_nworkers_cur++; - stmf_nworkers_accepting_cmds++; - } else { - /* Wait for transition to complete */ - return; - } - } - /* Check if we are trying to decrease the # of workers */ - for (i = (stmf_nworkers_cur - 1); i >= stmf_nworkers_needed; i--) { - if ((stmf_workers[i].worker_flags & STMF_WORKER_STARTED) == 0) { - stmf_nworkers_cur--; - /* - * stmf_nworkers_accepting_cmds has already been - * updated by the request to reduce the # of workers. - */ - } else { - /* Wait for transition to complete */ - return; - } - } - /* Check if we are being asked to quit */ - if (stmf_workers_state != STMF_WORKERS_ENABLED) { - if (stmf_nworkers_cur) { - workers_needed = 0; - goto worker_mgmt_trigger_change; - } - return; - } - /* Check if we are starting */ - if (stmf_nworkers_cur < stmf_i_min_nworkers) { - workers_needed = stmf_i_min_nworkers; - goto worker_mgmt_trigger_change; - } - - tps = drv_usectohz(1 * 1000 * 1000); - if ((stmf_wm_last != 0) && - ((d = ddi_get_lbolt() - stmf_wm_last) > tps)) { - qd = 0; - for (i = 0; i < stmf_nworkers_accepting_cmds; i++) { - qd += stmf_workers[i].worker_max_qdepth_pu; - stmf_workers[i].worker_max_qdepth_pu = 0; - if (stmf_workers[i].worker_max_sys_qdepth_pu > - cur_max_ntasks) { - cur_max_ntasks = - stmf_workers[i].worker_max_sys_qdepth_pu; - } - stmf_workers[i].worker_max_sys_qdepth_pu = 0; - } - } - stmf_wm_last = ddi_get_lbolt(); - if (d <= tps) { - /* still ramping up */ - return; - } - /* max qdepth cannot be more than max tasks */ - if (qd > cur_max_ntasks) - qd = cur_max_ntasks; - - /* See if we have more workers */ - if (qd < stmf_nworkers_accepting_cmds) { - /* - * Since we dont reduce the worker count right away, monitor - * the highest load during the scale_down_delay. - */ - if (qd > stmf_worker_scale_down_qd) - stmf_worker_scale_down_qd = qd; - if (stmf_worker_scale_down_timer == 0) { - stmf_worker_scale_down_timer = ddi_get_lbolt() + - drv_usectohz(stmf_worker_scale_down_delay * - 1000 * 1000); - return; - } - if (ddi_get_lbolt() < stmf_worker_scale_down_timer) { - return; - } - /* Its time to reduce the workers */ - if (stmf_worker_scale_down_qd < stmf_i_min_nworkers) - stmf_worker_scale_down_qd = stmf_i_min_nworkers; - if (stmf_worker_scale_down_qd > stmf_i_max_nworkers) - stmf_worker_scale_down_qd = stmf_i_max_nworkers; - if (stmf_worker_scale_down_qd == stmf_nworkers_cur) - return; - workers_needed = stmf_worker_scale_down_qd; - stmf_worker_scale_down_qd = 0; - goto worker_mgmt_trigger_change; - } - stmf_worker_scale_down_qd = 0; - stmf_worker_scale_down_timer = 0; - if (qd > stmf_i_max_nworkers) - qd = stmf_i_max_nworkers; - if (qd < stmf_i_min_nworkers) - qd = stmf_i_min_nworkers; - if (qd == stmf_nworkers_cur) - return; - workers_needed = qd; - goto worker_mgmt_trigger_change; - - /* NOTREACHED */ - return; - - worker_mgmt_trigger_change: - ASSERT(workers_needed != stmf_nworkers_cur); - if (workers_needed > stmf_nworkers_cur) { - stmf_nworkers_needed = workers_needed; - for (i = stmf_nworkers_cur; i < workers_needed; i++) { - w = &stmf_workers[i]; - w->worker_tid = thread_create(NULL, 0, stmf_worker_task, - (void *)&stmf_workers[i], 0, &p0, TS_RUN, - minclsyspri); - } - return; - } - /* At this point we know that we are decreasing the # of workers */ - stmf_nworkers_accepting_cmds = workers_needed; - stmf_nworkers_needed = workers_needed; - /* Signal the workers that its time to quit */ - for (i = (stmf_nworkers_cur - 1); i >= stmf_nworkers_needed; i--) { - w = &stmf_workers[i]; - ASSERT(w && (w->worker_flags & STMF_WORKER_STARTED)); - mutex_enter(&w->worker_lock); - w->worker_flags |= STMF_WORKER_TERMINATE; - if ((w->worker_flags & STMF_WORKER_ACTIVE) == 0) - cv_signal(&w->worker_cv); - mutex_exit(&w->worker_lock); - } - } - /* * Fills out a dbuf from stmf_xfer_data_t (contained in the db_lu_private). * If all the data has been filled out, frees the xd and makes * db_lu_private NULL. */ --- 6827,6836 ----
*** 6653,6665 **** p[2] = 5; p[3] = 0x12; p[4] = inq_page_length; p[6] = 0x80; ! (void) strncpy((char *)p+8, "SUN ", 8); ! (void) strncpy((char *)p+16, "COMSTAR ", 16); ! (void) strncpy((char *)p+32, "1.0 ", 4); dbuf->db_data_size = sz; dbuf->db_relative_offset = 0; dbuf->db_flags = DB_DIRECTION_TO_RPORT; (void) stmf_xfer_data(task, dbuf, 0); --- 6943,6955 ---- p[2] = 5; p[3] = 0x12; p[4] = inq_page_length; p[6] = 0x80; ! (void) strncpy((char *)p+8, "NONE ", 8); ! (void) strncpy((char *)p+16, "NONE ", 16); ! (void) strncpy((char *)p+32, "NONE", 4); dbuf->db_data_size = sz; dbuf->db_relative_offset = 0; dbuf->db_flags = DB_DIRECTION_TO_RPORT; (void) stmf_xfer_data(task, dbuf, 0);
*** 6865,6875 **** --- 7155,7171 ---- { /* This function will never be called */ cmn_err(CE_WARN, "stmf_dlun0_ctl called with cmd %x", cmd); } + /* ARGSUSED */ void + stmf_dlun0_task_done(struct scsi_task *task) + { + } + + void stmf_dlun_init() { stmf_i_lu_t *ilu; dlun0 = stmf_alloc(STMF_STRUCT_STMF_LU, 0, 0);
*** 6878,6887 **** --- 7174,7184 ---- dlun0->lu_dbuf_xfer_done = stmf_dlun0_dbuf_done; dlun0->lu_send_status_done = stmf_dlun0_status_done; dlun0->lu_task_free = stmf_dlun0_task_free; dlun0->lu_abort = stmf_dlun0_abort; dlun0->lu_task_poll = stmf_dlun0_task_poll; + dlun0->lu_task_done = stmf_dlun0_task_done; dlun0->lu_ctl = stmf_dlun0_ctl; ilu = (stmf_i_lu_t *)dlun0->lu_stmf_private; ilu->ilu_cur_task_cntr = &ilu->ilu_task_cntr1; }
*** 7142,7162 **** --- 7439,7470 ---- stmf_itl_task_start(stmf_i_scsi_task_t *itask) { stmf_itl_data_t *itl = itask->itask_itl_datap; scsi_task_t *task = itask->itask_task; stmf_i_lu_t *ilu; + stmf_i_scsi_session_t *iss = + itask->itask_task->task_session->ss_stmf_private; + stmf_i_remote_port_t *irport = iss->iss_irport; if (itl == NULL || task->task_lu == dlun0) return; ilu = (stmf_i_lu_t *)task->task_lu->lu_stmf_private; itask->itask_start_timestamp = gethrtime(); + itask->itask_xfer_done_timestamp = 0; if (ilu->ilu_kstat_io != NULL) { mutex_enter(ilu->ilu_kstat_io->ks_lock); stmf_update_kstat_lu_q(itask->itask_task, kstat_waitq_enter); mutex_exit(ilu->ilu_kstat_io->ks_lock); } + if (irport->irport_kstat_estat != NULL) { + if (task->task_flags & TF_READ_DATA) + atomic_inc_32(&irport->irport_nread_tasks); + else if (task->task_flags & TF_WRITE_DATA) + atomic_inc_32(&irport->irport_nwrite_tasks); + } + stmf_update_kstat_lport_q(itask->itask_task, kstat_waitq_enter); } void stmf_itl_lu_new_task(stmf_i_scsi_task_t *itask)
*** 7191,7200 **** --- 7499,7510 ---- ilu = (stmf_i_lu_t *)task->task_lu->lu_stmf_private; if (ilu->ilu_kstat_io == NULL) return; + stmf_update_kstat_rport_estat(task); + mutex_enter(ilu->ilu_kstat_io->ks_lock); if (itask->itask_flags & ITASK_KSTAT_IN_RUNQ) { stmf_update_kstat_lu_q(task, kstat_runq_exit); mutex_exit(ilu->ilu_kstat_io->ks_lock);
*** 7204,7213 **** --- 7514,7540 ---- mutex_exit(ilu->ilu_kstat_io->ks_lock); stmf_update_kstat_lport_q(task, kstat_waitq_exit); } } + void + stmf_lu_xfer_done(scsi_task_t *task, boolean_t read, hrtime_t elapsed_time) + { + stmf_i_scsi_task_t *itask = task->task_stmf_private; + + if (task->task_lu == dlun0) + return; + + if (read) { + atomic_add_64((uint64_t *)&itask->itask_lu_read_time, + elapsed_time); + } else { + atomic_add_64((uint64_t *)&itask->itask_lu_write_time, + elapsed_time); + } + } + static void stmf_lport_xfer_start(stmf_i_scsi_task_t *itask, stmf_data_buf_t *dbuf) { stmf_itl_data_t *itl = itask->itask_itl_datap;
*** 7231,7241 **** return; xfer_size = (dbuf->db_xfer_status == STMF_SUCCESS) ? dbuf->db_data_size : 0; ! elapsed_time = gethrtime() - dbuf->db_xfer_start_timestamp; if (dbuf->db_flags & DB_DIRECTION_TO_RPORT) { atomic_add_64((uint64_t *)&itask->itask_lport_read_time, elapsed_time); atomic_add_64((uint64_t *)&itask->itask_read_xfer, xfer_size); --- 7558,7570 ---- return; xfer_size = (dbuf->db_xfer_status == STMF_SUCCESS) ? dbuf->db_data_size : 0; ! itask->itask_xfer_done_timestamp = gethrtime(); ! elapsed_time = itask->itask_xfer_done_timestamp - ! dbuf->db_xfer_start_timestamp; if (dbuf->db_flags & DB_DIRECTION_TO_RPORT) { atomic_add_64((uint64_t *)&itask->itask_lport_read_time, elapsed_time); atomic_add_64((uint64_t *)&itask->itask_read_xfer, xfer_size);
*** 7255,7265 **** void stmf_svc_init() { if (stmf_state.stmf_svc_flags & STMF_SVC_STARTED) return; ! stmf_state.stmf_svc_tailp = &stmf_state.stmf_svc_active; stmf_state.stmf_svc_taskq = ddi_taskq_create(0, "STMF_SVC_TASKQ", 1, TASKQ_DEFAULTPRI, 0); (void) ddi_taskq_dispatch(stmf_state.stmf_svc_taskq, stmf_svc, 0, DDI_SLEEP); } --- 7584,7595 ---- void stmf_svc_init() { if (stmf_state.stmf_svc_flags & STMF_SVC_STARTED) return; ! list_create(&stmf_state.stmf_svc_list, sizeof (stmf_svc_req_t), ! offsetof(stmf_svc_req_t, svc_list_entry)); stmf_state.stmf_svc_taskq = ddi_taskq_create(0, "STMF_SVC_TASKQ", 1, TASKQ_DEFAULTPRI, 0); (void) ddi_taskq_dispatch(stmf_state.stmf_svc_taskq, stmf_svc, 0, DDI_SLEEP); }
*** 7284,7293 **** --- 7614,7624 ---- break; } if (i == 500) return (STMF_BUSY); + list_destroy(&stmf_state.stmf_svc_list); ddi_taskq_destroy(stmf_state.stmf_svc_taskq); return (STMF_SUCCESS); }
*** 7309,7319 **** mutex_enter(&stmf_state.stmf_lock); stmf_state.stmf_svc_flags |= STMF_SVC_STARTED | STMF_SVC_ACTIVE; while (!(stmf_state.stmf_svc_flags & STMF_SVC_TERMINATE)) { ! if (stmf_state.stmf_svc_active == NULL) { stmf_svc_timeout(&clks); continue; } /* --- 7640,7650 ---- mutex_enter(&stmf_state.stmf_lock); stmf_state.stmf_svc_flags |= STMF_SVC_STARTED | STMF_SVC_ACTIVE; while (!(stmf_state.stmf_svc_flags & STMF_SVC_TERMINATE)) { ! if (list_is_empty(&stmf_state.stmf_svc_list)) { stmf_svc_timeout(&clks); continue; } /*
*** 7320,7335 **** * Pop the front request from the active list. After this, * the request will no longer be referenced by global state, * so it should be safe to access it without holding the * stmf state lock. */ ! req = stmf_state.stmf_svc_active; ! stmf_state.stmf_svc_active = req->svc_next; - if (stmf_state.stmf_svc_active == NULL) - stmf_state.stmf_svc_tailp = &stmf_state.stmf_svc_active; - switch (req->svc_cmd) { case STMF_CMD_LPORT_ONLINE: /* Fallthrough */ case STMF_CMD_LPORT_OFFLINE: mutex_exit(&stmf_state.stmf_lock); --- 7651,7664 ---- * Pop the front request from the active list. After this, * the request will no longer be referenced by global state, * so it should be safe to access it without holding the * stmf state lock. */ ! req = list_remove_head(&stmf_state.stmf_svc_list); ! if (req == NULL) ! continue; switch (req->svc_cmd) { case STMF_CMD_LPORT_ONLINE: /* Fallthrough */ case STMF_CMD_LPORT_OFFLINE: mutex_exit(&stmf_state.stmf_lock);
*** 7396,7406 **** /* we still have some ilu items to check */ clks->timing_next = ddi_get_lbolt() + drv_usectohz(1*1000*1000); } ! if (stmf_state.stmf_svc_active) return; } /* Check if there are free tasks to clear */ if (stmf_state.stmf_nlus && --- 7725,7735 ---- /* we still have some ilu items to check */ clks->timing_next = ddi_get_lbolt() + drv_usectohz(1*1000*1000); } ! if (!list_is_empty(&stmf_state.stmf_svc_list)) return; } /* Check if there are free tasks to clear */ if (stmf_state.stmf_nlus &&
*** 7421,7441 **** /* we still have some ilu items to check */ clks->drain_next = ddi_get_lbolt() + drv_usectohz(1*1000*1000); } ! if (stmf_state.stmf_svc_active) return; } - /* Check if we need to run worker_mgmt */ - if (ddi_get_lbolt() > clks->worker_delay) { - stmf_worker_mgmt(); - clks->worker_delay = ddi_get_lbolt() + - stmf_worker_mgmt_delay; - } - /* Check if any active session got its 1st LUN */ if (stmf_state.stmf_process_initial_luns) { int stmf_level = 0; int port_level; --- 7750,7763 ---- /* we still have some ilu items to check */ clks->drain_next = ddi_get_lbolt() + drv_usectohz(1*1000*1000); } ! if (!list_is_empty(&stmf_state.stmf_svc_list)) return; } /* Check if any active session got its 1st LUN */ if (stmf_state.stmf_process_initial_luns) { int stmf_level = 0; int port_level;
*** 7558,7604 **** sizeof (stmf_svc_req_t))); (void) strcpy(req->svc_info.st_additional_info, info->st_additional_info); } req->svc_req_alloc_size = s; - req->svc_next = NULL; mutex_enter(&stmf_state.stmf_lock); ! *stmf_state.stmf_svc_tailp = req; ! stmf_state.stmf_svc_tailp = &req->svc_next; if ((stmf_state.stmf_svc_flags & STMF_SVC_ACTIVE) == 0) { cv_signal(&stmf_state.stmf_cv); } mutex_exit(&stmf_state.stmf_lock); } static void stmf_svc_kill_obj_requests(void *obj) { - stmf_svc_req_t *prev_req = NULL; - stmf_svc_req_t *next_req; stmf_svc_req_t *req; ASSERT(mutex_owned(&stmf_state.stmf_lock)); ! for (req = stmf_state.stmf_svc_active; req != NULL; req = next_req) { ! next_req = req->svc_next; ! if (req->svc_obj == obj) { ! if (prev_req != NULL) ! prev_req->svc_next = next_req; ! else ! stmf_state.stmf_svc_active = next_req; ! ! if (next_req == NULL) ! stmf_state.stmf_svc_tailp = (prev_req != NULL) ? ! &prev_req->svc_next : ! &stmf_state.stmf_svc_active; ! kmem_free(req, req->svc_req_alloc_size); - } else { - prev_req = req; } } } void --- 7880,7910 ---- sizeof (stmf_svc_req_t))); (void) strcpy(req->svc_info.st_additional_info, info->st_additional_info); } req->svc_req_alloc_size = s; mutex_enter(&stmf_state.stmf_lock); ! list_insert_tail(&stmf_state.stmf_svc_list, req); if ((stmf_state.stmf_svc_flags & STMF_SVC_ACTIVE) == 0) { cv_signal(&stmf_state.stmf_cv); } mutex_exit(&stmf_state.stmf_lock); } static void stmf_svc_kill_obj_requests(void *obj) { stmf_svc_req_t *req; ASSERT(mutex_owned(&stmf_state.stmf_lock)); ! for (req = list_head(&stmf_state.stmf_svc_list); req != NULL; ! req = list_next(&stmf_state.stmf_svc_list, req)) { if (req->svc_obj == obj) { ! list_remove(&stmf_state.stmf_svc_list, req); kmem_free(req, req->svc_req_alloc_size); } } } void
*** 7640,7660 **** --- 7946,7978 ---- if (trace_buf_size > 0) stmf_trace_buf[0] = 0; mutex_exit(&trace_buf_lock); } + /* + * NOTE: Due to lock order problems that are not possible to fix this + * method drops and reacquires the itask_mutex around the call to stmf_ctl. + * Another possible work around would be to use a dispatch queue and have + * the call to stmf_ctl run on another thread that's not holding the + * itask_mutex. The problem with that approach is that it's difficult to + * determine what impact an asynchronous change would have on the system state. + */ static void stmf_abort_task_offline(scsi_task_t *task, int offline_lu, char *info) { stmf_state_change_info_t change_info; void *ctl_private; uint32_t ctl_cmd; int msg = 0; + stmf_i_scsi_task_t *itask = + (stmf_i_scsi_task_t *)task->task_stmf_private; stmf_trace("FROM STMF", "abort_task_offline called for %s: %s", offline_lu ? "LU" : "LPORT", info ? info : "no additional info"); change_info.st_additional_info = info; + ASSERT(mutex_owned(&itask->itask_mutex)); + if (offline_lu) { change_info.st_rflags = STMF_RFLAG_RESET | STMF_RFLAG_LU_ABORT; ctl_private = task->task_lu; if (((stmf_i_lu_t *)
*** 7678,7688 **** --- 7996,8008 ---- if (msg) { stmf_trace(0, "Calling stmf_ctl to offline %s : %s", offline_lu ? "LU" : "LPORT", info ? info : "<no additional info>"); } + mutex_exit(&itask->itask_mutex); (void) stmf_ctl(ctl_cmd, ctl_private, &change_info); + mutex_enter(&itask->itask_mutex); } static char stmf_ctoi(char c) {
*** 7735,7745 **** /* FC Transport ID validation checks. SPC3 rev23, Table 284 */ if (total_sz < tpd_len || tptid->format_code != 0) return (B_FALSE); break; ! case PROTOCOL_iSCSI: { iscsi_transport_id_t *iscsiid; uint16_t adn_len, name_len; /* Check for valid format code, SPC3 rev 23 Table 288 */ --- 8055,8065 ---- /* FC Transport ID validation checks. SPC3 rev23, Table 284 */ if (total_sz < tpd_len || tptid->format_code != 0) return (B_FALSE); break; ! case PROTOCOL_iSCSI: /* CSTYLED */ { iscsi_transport_id_t *iscsiid; uint16_t adn_len, name_len; /* Check for valid format code, SPC3 rev 23 Table 288 */
*** 7780,7790 **** case PROTOCOL_SSA: case PROTOCOL_IEEE_1394: case PROTOCOL_SAS: case PROTOCOL_ADT: case PROTOCOL_ATAPI: ! default: { stmf_dflt_scsi_tptid_t *dflttpd; tpd_len = sizeof (stmf_dflt_scsi_tptid_t); if (total_sz < tpd_len) --- 8100,8110 ---- case PROTOCOL_SSA: case PROTOCOL_IEEE_1394: case PROTOCOL_SAS: case PROTOCOL_ADT: case PROTOCOL_ATAPI: ! default: /* CSTYLED */ { stmf_dflt_scsi_tptid_t *dflttpd; tpd_len = sizeof (stmf_dflt_scsi_tptid_t); if (total_sz < tpd_len)
*** 7809,7819 **** (tpd1->format_code != tpd2->format_code)) return (B_FALSE); switch (tpd1->protocol_id) { ! case PROTOCOL_iSCSI: { iscsi_transport_id_t *iscsitpd1, *iscsitpd2; uint16_t len; iscsitpd1 = (iscsi_transport_id_t *)tpd1; --- 8129,8139 ---- (tpd1->format_code != tpd2->format_code)) return (B_FALSE); switch (tpd1->protocol_id) { ! case PROTOCOL_iSCSI: /* CSTYLED */ { iscsi_transport_id_t *iscsitpd1, *iscsitpd2; uint16_t len; iscsitpd1 = (iscsi_transport_id_t *)tpd1;
*** 7824,7834 **** != 0)) return (B_FALSE); } break; ! case PROTOCOL_SRP: { scsi_srp_transport_id_t *srptpd1, *srptpd2; srptpd1 = (scsi_srp_transport_id_t *)tpd1; srptpd2 = (scsi_srp_transport_id_t *)tpd2; --- 8144,8154 ---- != 0)) return (B_FALSE); } break; ! case PROTOCOL_SRP: /* CSTYLED */ { scsi_srp_transport_id_t *srptpd1, *srptpd2; srptpd1 = (scsi_srp_transport_id_t *)tpd1; srptpd2 = (scsi_srp_transport_id_t *)tpd2;
*** 7836,7846 **** sizeof (srptpd1->srp_name)) != 0) return (B_FALSE); } break; ! case PROTOCOL_FIBRE_CHANNEL: { scsi_fc_transport_id_t *fctpd1, *fctpd2; fctpd1 = (scsi_fc_transport_id_t *)tpd1; fctpd2 = (scsi_fc_transport_id_t *)tpd2; --- 8156,8166 ---- sizeof (srptpd1->srp_name)) != 0) return (B_FALSE); } break; ! case PROTOCOL_FIBRE_CHANNEL: /* CSTYLED */ { scsi_fc_transport_id_t *fctpd1, *fctpd2; fctpd1 = (scsi_fc_transport_id_t *)tpd1; fctpd2 = (scsi_fc_transport_id_t *)tpd2;
*** 7854,7864 **** case PROTOCOL_SSA: case PROTOCOL_IEEE_1394: case PROTOCOL_SAS: case PROTOCOL_ADT: case PROTOCOL_ATAPI: ! default: { stmf_dflt_scsi_tptid_t *dflt1, *dflt2; uint16_t len; dflt1 = (stmf_dflt_scsi_tptid_t *)tpd1; --- 8174,8184 ---- case PROTOCOL_SSA: case PROTOCOL_IEEE_1394: case PROTOCOL_SAS: case PROTOCOL_ADT: case PROTOCOL_ATAPI: ! default: /* CSTYLED */ { stmf_dflt_scsi_tptid_t *dflt1, *dflt2; uint16_t len; dflt1 = (stmf_dflt_scsi_tptid_t *)tpd1;
*** 7965,7975 **** return (NULL); } stmf_remote_port_t * ! stmf_remote_port_alloc(uint16_t tptid_sz) { stmf_remote_port_t *rpt; rpt = (stmf_remote_port_t *)kmem_zalloc( sizeof (stmf_remote_port_t) + tptid_sz, KM_SLEEP); rpt->rport_tptid_sz = tptid_sz; rpt->rport_tptid = (scsi_transport_id_t *)(rpt + 1); --- 8285,8296 ---- return (NULL); } stmf_remote_port_t * ! stmf_remote_port_alloc(uint16_t tptid_sz) ! { stmf_remote_port_t *rpt; rpt = (stmf_remote_port_t *)kmem_zalloc( sizeof (stmf_remote_port_t) + tptid_sz, KM_SLEEP); rpt->rport_tptid_sz = tptid_sz; rpt->rport_tptid = (scsi_transport_id_t *)(rpt + 1);
*** 7985,7990 **** --- 8306,8382 ---- * it is safe to deallocate it in a protocol independent manner. * If any of the allocation method changes, corresponding changes * need to be made here too. */ kmem_free(rpt, sizeof (stmf_remote_port_t) + rpt->rport_tptid_sz); + } + + stmf_lu_t * + stmf_check_and_hold_lu(scsi_task_t *task, uint8_t *guid) + { + stmf_i_scsi_session_t *iss; + stmf_lu_t *lu; + stmf_i_lu_t *ilu = NULL; + stmf_lun_map_t *sm; + stmf_lun_map_ent_t *lme; + int i; + + iss = (stmf_i_scsi_session_t *)task->task_session->ss_stmf_private; + rw_enter(iss->iss_lockp, RW_READER); + sm = iss->iss_sm; + + for (i = 0; i < sm->lm_nentries; i++) { + if (sm->lm_plus[i] == NULL) + continue; + lme = (stmf_lun_map_ent_t *)sm->lm_plus[i]; + lu = lme->ent_lu; + if (bcmp(lu->lu_id->ident, guid, 16) == 0) { + break; + } + lu = NULL; + } + + if (!lu) { + goto hold_lu_done; + } + + ilu = lu->lu_stmf_private; + mutex_enter(&ilu->ilu_task_lock); + ilu->ilu_additional_ref++; + mutex_exit(&ilu->ilu_task_lock); + + hold_lu_done: + rw_exit(iss->iss_lockp); + return (lu); + } + + void + stmf_release_lu(stmf_lu_t *lu) + { + stmf_i_lu_t *ilu; + + ilu = lu->lu_stmf_private; + ASSERT(ilu->ilu_additional_ref != 0); + mutex_enter(&ilu->ilu_task_lock); + ilu->ilu_additional_ref--; + mutex_exit(&ilu->ilu_task_lock); + } + + int + stmf_is_task_being_aborted(scsi_task_t *task) + { + stmf_i_scsi_task_t *itask; + + itask = (stmf_i_scsi_task_t *)task->task_stmf_private; + if (itask->itask_flags & ITASK_BEING_ABORTED) + return (1); + + return (0); + } + + volatile boolean_t stmf_pgr_aptpl_always = B_FALSE; + + boolean_t + stmf_is_pgr_aptpl_always() + { + return (stmf_pgr_aptpl_always); }