Print this page
OS-5464 signalfd deadlock on pollwakeup
Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Reviewed by: Robert Mustacchi <rm@joyent.com>
OS-5370 panic in signalfd
Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Reviewed by: Robert Mustacchi <rm@joyent.com>
OS-3742 lxbrand add support for signalfd
OS-4382 remove obsolete brand hooks added during lx development

*** 8,117 **** * source. A copy of the CDDL is also available via the Internet at * http://www.illumos.org/license/CDDL. */ /* ! * Copyright 2015 Joyent, Inc. */ /* * Support for the signalfd facility, a Linux-borne facility for * file descriptor-based synchronous signal consumption. * * As described on the signalfd(3C) man page, the general idea behind these * file descriptors is that they can be used to synchronously consume signals ! * via the read(2) syscall. That capability already exists with the ! * sigwaitinfo(3C) function but the key advantage of signalfd is that, because ! * it is file descriptor based, poll(2) can be used to determine when signals ! * are available to be consumed. * ! * The general implementation uses signalfd_state to hold both the signal set ! * and poll head for an open file descriptor. Because a process can be using ! * different sigfds with different signal sets, each signalfd_state poll head ! * can be thought of as an independent signal stream and the thread(s) waiting ! * on that stream will get poll notification when any signal in the ! * corresponding set is received. * ! * The sigfd_proc_state_t struct lives on the proc_t and maintains per-proc ! * state for function callbacks and data when the proc needs to do work during ! * signal delivery for pollwakeup. * ! * The read side of the implementation is straightforward and mimics the ! * kernel behavior for sigtimedwait(). Signals continue to live on either ! * the proc's p_sig, or thread's t_sig, member. Read consumes the signal so ! * that it is no longer pending. * ! * The poll side is more complex since all of the sigfds on the process need ! * to be examined every time a signal is delivered to the process in order to ! * pollwake any thread waiting in poll for that signal. * ! * Because it is likely that a process will only be using one, or a few, sigfds, ! * but many total file descriptors, we maintain a list of sigfds which need ! * pollwakeup. The list lives on the proc's p_sigfd struct. In this way only ! * zero, or a few, of the state structs will need to be examined every time a ! * signal is delivered to the process, instead of having to examine all of the ! * file descriptors to find the state structs. When a state struct with a ! * matching signal set is found then pollwakeup is called. * ! * The sigfd_list is self-cleaning; as signalfd_pollwake_cb is called, the list ! * will clear out on its own. There is an exit helper (signalfd_exit_helper) ! * which cleans up any remaining per-proc state when the process exits. * ! * The main complexity with signalfd is the interaction of forking and polling. ! * This interaction is complex because now two processes have a fd that ! * references the same dev_t (and its associated signalfd_state), but signals ! * go to only one of those processes. Also, we don't know when one of the ! * processes closes its fd because our 'close' entry point is only called when ! * the last fd is closed (which could be by either process). * ! * Because the state struct is referenced by both file descriptors, and the ! * state struct represents a signal stream needing a pollwakeup, if both ! * processes were polling then both processes would get a pollwakeup when a ! * signal arrives for either process (that is, the pollhead is associated with ! * our dev_t so when a signal arrives the pollwakeup wakes up all waiters). * ! * Fortunately this is not a common problem in practice, but the implementation ! * attempts to mitigate unexpected behavior. The typical behavior is that the ! * parent has been polling the signalfd (which is why it was open in the first ! * place) and the parent might have a pending signalfd_state (with the ! * pollhead) on its per-process sigfd_list. After the fork the child will ! * simply close that fd (among others) as part of the typical fork/close/exec ! * pattern. Because the child will never poll that fd, it will never get any ! * state onto its own sigfd_list (the child starts with a null list). The ! * intention is that the child sees no pollwakeup activity for signals unless ! * it explicitly reinvokes poll on the sigfd. * ! * As background, there are two primary polling cases to consider when the ! * parent process forks: ! * 1) If any thread is blocked in poll(2) then both the parent and child will ! * return from the poll syscall with EINTR. This means that if either ! * process wants to re-poll on a sigfd then it needs to re-run poll and ! * would come back in to the signalfd_poll entry point. The parent would ! * already have the dev_t's state on its sigfd_list and the child would not ! * have anything there unless it called poll again on its fd. ! * 2) If the process is using /dev/poll(7D) then the polling info is being ! * cached by the poll device and the process might not currently be blocked ! * on anything polling related. A subsequent DP_POLL ioctl will not invoke ! * our signalfd_poll entry point again. Because the parent still has its ! * sigfd_list setup, an incoming signal will hit our signalfd_pollwake_cb ! * entry point, which in turn calls pollwake, and /dev/poll will do the ! * right thing on DP_POLL. The child will not have a sigfd_list yet so the ! * signal will not cause a pollwakeup. The dp code does its own handling for ! * cleaning up its cache. * ! * This leaves only one odd corner case. If the parent and child both use ! * the dup-ed sigfd to poll then when a signal is delivered to either process ! * there is no way to determine which one should get the pollwakeup (since ! * both processes will be queued on the same signal stream poll head). What ! * happens in this case is that both processes will return from poll, but only ! * one of them will actually have a signal to read. The other will return ! * from read with EAGAIN, or block. This case is actually similar to the ! * situation within a single process which got two different sigfd's with the ! * same mask (or poll on two fd's that are dup-ed). Both would return from poll ! * when a signal arrives but only one read would consume the signal and the ! * other read would fail or block. Applications which poll on shared fd's ! * cannot assume that a subsequent read will actually obtain data. */ #include <sys/ddi.h> #include <sys/sunddi.h> #include <sys/signalfd.h> --- 8,101 ---- * source. A copy of the CDDL is also available via the Internet at * http://www.illumos.org/license/CDDL. */ /* ! * Copyright 2016 Joyent, Inc. */ /* * Support for the signalfd facility, a Linux-borne facility for * file descriptor-based synchronous signal consumption. * * As described on the signalfd(3C) man page, the general idea behind these * file descriptors is that they can be used to synchronously consume signals ! * via the read(2) syscall. While that capability already exists with the ! * sigwaitinfo(3C) function, signalfd holds an advantage since it is file ! * descriptor based: It is able use the event facilities (poll(2), /dev/poll, ! * event ports) to notify interested parties when consumable signals arrive. * ! * The signalfd lifecycle begins When a process opens /dev/signalfd. A minor ! * will be allocated for them along with an associated signalfd_state_t struct. ! * It is there where the mask of desired signals resides. * ! * Reading from the signalfd is straightforward and mimics the kernel behavior ! * for sigtimedwait(). Signals continue to live on either the proc's p_sig, or ! * thread's t_sig, member. During a read operation, those which match the mask ! * are consumed so they are no longer pending. * ! * The poll side is more complex. Every time a signal is delivered, all of the ! * signalfds on the process need to be examined in order to pollwake threads ! * waiting for signal arrival. * ! * When a thread polling on a signalfd requires a pollhead, several steps must ! * be taken to safely ensure the proper result. A sigfd_proc_state_t is ! * created for the calling process if it does not yet exist. It is there where ! * a list of sigfd_poll_waiter_t structures reside which associate pollheads to ! * signalfd_state_t entries. The sigfd_proc_state_t list is walked to find a ! * sigfd_poll_waiter_t matching the signalfd_state_t which corresponds to the ! * polled resource. If one is found, it is reused. Otherwise a new one is ! * created, incrementing the refcount on the signalfd_state_t, and it is added ! * to the sigfd_poll_waiter_t list. * ! * The complications imposed by fork(2) are why the pollhead is stored in the ! * associated sigfd_poll_waiter_t instead of directly in the signalfd_state_t. ! * More than one process can hold a reference to the signalfd at a time but ! * arriving signals should wake only process-local pollers. Additionally, ! * signalfd_close is called only when the last referencing fd is closed, hiding ! * occurrences of preceeding threads which released their references. This ! * necessitates reference counting on the signalfd_state_t so it is able to ! * persist after close until all poll references have been cleansed. Doing so ! * ensures that blocked pollers which hold references to the signalfd_state_t ! * will be able to do clean-up after the descriptor itself has been closed. * ! * When a signal arrives in a process polling on signalfd, signalfd_pollwake_cb ! * is called via the pointer in sigfd_proc_state_t. It will walk over the ! * sigfd_poll_waiter_t entries present in the list, searching for any ! * associated with a signalfd_state_t with a matching signal mask. The ! * approach of keeping the poller list in p_sigfd was chosen because a process ! * is likely to use few signalfds relative to its total file descriptors. It ! * reduces the work required for each received signal. * ! * When matching sigfd_poll_waiter_t entries are encountered in the poller list ! * during signalfd_pollwake_cb, they are dispatched into signalfd_wakeq to ! * perform the pollwake. This is due to a lock ordering conflict between ! * signalfd_poll and signalfd_pollwake_cb. The former acquires ! * pollcache_t`pc_lock before proc_t`p_lock. The latter (via sigtoproc) ! * reverses the order. Defering the pollwake into a taskq means it can be ! * performed without proc_t`p_lock held, avoiding the deadlock. * ! * The sigfd_list is self-cleaning; as signalfd_pollwake_cb is called, the list ! * will clear out on its own. Any remaining per-process state which remains ! * will be cleaned up by the exit helper (signalfd_exit_helper). * ! * The structures associated with signalfd state are designed to operate ! * correctly across fork, but there is one caveat that applies. Using ! * fork-shared signalfd descriptors in conjuction with fork-shared caching poll ! * descriptors (such as /dev/poll or event ports) will result in missed poll ! * wake-ups. This is caused by the pollhead identity of signalfd descriptors ! * being dependent on the process they are polled from. Because it has a ! * thread-local cache, poll(2) is unaffected by this limitation. * ! * Lock ordering: * ! * 1. signalfd_lock ! * 2. signalfd_state_t`sfd_lock ! * ! * 1. proc_t`p_lock (to walk p_sigfd) ! * 2. signalfd_state_t`sfd_lock ! * 2a. signalfd_lock (after sfd_lock is dropped, when sfd_count falls to 0) */ #include <sys/ddi.h> #include <sys/sunddi.h> #include <sys/signalfd.h>
*** 121,246 **** #include <sys/stat.h> #include <sys/file.h> #include <sys/schedctl.h> #include <sys/id_space.h> #include <sys/sdt.h> typedef struct signalfd_state signalfd_state_t; struct signalfd_state { ! kmutex_t sfd_lock; /* lock protecting state */ ! pollhead_t sfd_pollhd; /* poll head */ k_sigset_t sfd_set; /* signals for this fd */ - signalfd_state_t *sfd_next; /* next state on global list */ }; /* ! * Internal global variables. */ ! static kmutex_t signalfd_lock; /* lock protecting state */ static dev_info_t *signalfd_devi; /* device info */ static id_space_t *signalfd_minor; /* minor number arena */ static void *signalfd_softstate; /* softstate pointer */ ! static signalfd_state_t *signalfd_state; /* global list of state */ ! /* ! * If we don't already have an entry in the proc's list for this state, add one. ! */ static void ! signalfd_wake_list_add(signalfd_state_t *state) { ! proc_t *p = curproc; ! list_t *lst; ! sigfd_wake_list_t *wlp; ! ASSERT(MUTEX_HELD(&p->p_lock)); ! ASSERT(p->p_sigfd != NULL); ! lst = &((sigfd_proc_state_t *)p->p_sigfd)->sigfd_list; ! for (wlp = list_head(lst); wlp != NULL; wlp = list_next(lst, wlp)) { ! if (wlp->sigfd_wl_state == state) ! break; } ! if (wlp == NULL) { ! wlp = kmem_zalloc(sizeof (sigfd_wake_list_t), KM_SLEEP); ! wlp->sigfd_wl_state = state; ! list_insert_head(lst, wlp); } } ! static void ! signalfd_wake_rm(list_t *lst, sigfd_wake_list_t *wlp) { ! list_remove(lst, wlp); ! kmem_free(wlp, sizeof (sigfd_wake_list_t)); ! } ! static void ! signalfd_wake_list_rm(proc_t *p, signalfd_state_t *state) ! { ! sigfd_wake_list_t *wlp; ! list_t *lst; ! ASSERT(MUTEX_HELD(&p->p_lock)); ! if (p->p_sigfd == NULL) ! return; ! lst = &((sigfd_proc_state_t *)p->p_sigfd)->sigfd_list; ! for (wlp = list_head(lst); wlp != NULL; wlp = list_next(lst, wlp)) { ! if (wlp->sigfd_wl_state == state) { ! signalfd_wake_rm(lst, wlp); break; } } ! if (list_is_empty(lst)) { ! ((sigfd_proc_state_t *)p->p_sigfd)->sigfd_pollwake_cb = NULL; ! list_destroy(lst); ! kmem_free(p->p_sigfd, sizeof (sigfd_proc_state_t)); ! p->p_sigfd = NULL; } } static void signalfd_wake_list_cleanup(proc_t *p) { ! sigfd_wake_list_t *wlp; list_t *lst; ASSERT(MUTEX_HELD(&p->p_lock)); ! ((sigfd_proc_state_t *)p->p_sigfd)->sigfd_pollwake_cb = NULL; ! lst = &((sigfd_proc_state_t *)p->p_sigfd)->sigfd_list; ! while (!list_is_empty(lst)) { ! wlp = (sigfd_wake_list_t *)list_remove_head(lst); ! kmem_free(wlp, sizeof (sigfd_wake_list_t)); } } static void signalfd_exit_helper(void) { proc_t *p = curproc; - list_t *lst; - /* This being non-null is the only way we can get here */ - ASSERT(p->p_sigfd != NULL); - mutex_enter(&p->p_lock); - lst = &((sigfd_proc_state_t *)p->p_sigfd)->sigfd_list; - signalfd_wake_list_cleanup(p); - list_destroy(lst); - kmem_free(p->p_sigfd, sizeof (sigfd_proc_state_t)); - p->p_sigfd = NULL; mutex_exit(&p->p_lock); } /* * Called every time a signal is delivered to the process so that we can * see if any signal stream needs a pollwakeup. We maintain a list of * signal state elements so that we don't have to look at every file descriptor * on the process. If necessary, a further optimization would be to maintain a * signal set mask that is a union of all of the sets in the list so that --- 105,292 ---- #include <sys/stat.h> #include <sys/file.h> #include <sys/schedctl.h> #include <sys/id_space.h> #include <sys/sdt.h> + #include <sys/brand.h> + #include <sys/disp.h> + #include <sys/taskq_impl.h> typedef struct signalfd_state signalfd_state_t; struct signalfd_state { ! list_node_t sfd_list; /* node in global list */ ! kmutex_t sfd_lock; /* protects fields below */ ! uint_t sfd_count; /* ref count */ ! boolean_t sfd_valid; /* valid while open */ k_sigset_t sfd_set; /* signals for this fd */ }; + typedef struct sigfd_poll_waiter { + list_node_t spw_list; + signalfd_state_t *spw_state; + pollhead_t spw_pollhd; + taskq_ent_t spw_taskent; + short spw_pollev; + } sigfd_poll_waiter_t; + /* ! * Protects global state in signalfd_devi, signalfd_minor, signalfd_softstate, ! * and signalfd_state (including sfd_list field of members) */ ! static kmutex_t signalfd_lock; static dev_info_t *signalfd_devi; /* device info */ static id_space_t *signalfd_minor; /* minor number arena */ static void *signalfd_softstate; /* softstate pointer */ ! static list_t signalfd_state; /* global list of state */ ! static taskq_t *signalfd_wakeq; /* pollwake event taskq */ ! static void ! signalfd_state_enter_locked(signalfd_state_t *state) { ! ASSERT(MUTEX_HELD(&state->sfd_lock)); ! ASSERT(state->sfd_count > 0); ! VERIFY(state->sfd_valid == B_TRUE); ! state->sfd_count++; ! } ! static void ! signalfd_state_release(signalfd_state_t *state, boolean_t force_invalidate) ! { ! mutex_enter(&state->sfd_lock); ! ! if (force_invalidate) { ! state->sfd_valid = B_FALSE; } ! ASSERT(state->sfd_count > 0); ! if (state->sfd_count == 1) { ! VERIFY(state->sfd_valid == B_FALSE); ! mutex_exit(&state->sfd_lock); ! if (force_invalidate) { ! /* ! * The invalidation performed in signalfd_close is done ! * while signalfd_lock is held. ! */ ! ASSERT(MUTEX_HELD(&signalfd_lock)); ! list_remove(&signalfd_state, state); ! } else { ! ASSERT(MUTEX_NOT_HELD(&signalfd_lock)); ! mutex_enter(&signalfd_lock); ! list_remove(&signalfd_state, state); ! mutex_exit(&signalfd_lock); } + kmem_free(state, sizeof (*state)); + return; + } + state->sfd_count--; + mutex_exit(&state->sfd_lock); } ! static sigfd_poll_waiter_t * ! signalfd_wake_list_add(sigfd_proc_state_t *pstate, signalfd_state_t *state) { ! list_t *lst = &pstate->sigfd_list; ! sigfd_poll_waiter_t *pw; ! for (pw = list_head(lst); pw != NULL; pw = list_next(lst, pw)) { ! if (pw->spw_state == state) ! break; ! } ! if (pw == NULL) { ! pw = kmem_zalloc(sizeof (*pw), KM_SLEEP); ! mutex_enter(&state->sfd_lock); ! signalfd_state_enter_locked(state); ! pw->spw_state = state; ! mutex_exit(&state->sfd_lock); ! list_insert_head(lst, pw); ! } ! return (pw); ! } ! static sigfd_poll_waiter_t * ! signalfd_wake_list_rm(sigfd_proc_state_t *pstate, signalfd_state_t *state) ! { ! list_t *lst = &pstate->sigfd_list; ! sigfd_poll_waiter_t *pw; ! ! for (pw = list_head(lst); pw != NULL; pw = list_next(lst, pw)) { ! if (pw->spw_state == state) { break; } } ! if (pw != NULL) { ! list_remove(lst, pw); ! pw->spw_state = NULL; ! signalfd_state_release(state, B_FALSE); } + + return (pw); } static void signalfd_wake_list_cleanup(proc_t *p) { ! sigfd_proc_state_t *pstate = p->p_sigfd; ! sigfd_poll_waiter_t *pw; list_t *lst; ASSERT(MUTEX_HELD(&p->p_lock)); + ASSERT(pstate != NULL); ! lst = &pstate->sigfd_list; ! while ((pw = list_remove_head(lst)) != NULL) { ! signalfd_state_t *state = pw->spw_state; ! pw->spw_state = NULL; ! signalfd_state_release(state, B_FALSE); ! ! pollwakeup(&pw->spw_pollhd, POLLERR); ! pollhead_clean(&pw->spw_pollhd); ! kmem_free(pw, sizeof (*pw)); } + list_destroy(lst); + + p->p_sigfd = NULL; + kmem_free(pstate, sizeof (*pstate)); } static void signalfd_exit_helper(void) { proc_t *p = curproc; mutex_enter(&p->p_lock); signalfd_wake_list_cleanup(p); mutex_exit(&p->p_lock); } /* + * Perform pollwake for a sigfd_poll_waiter_t entry. + * Thanks to the strict and conflicting lock orders required for signalfd_poll + * (pc_lock before p_lock) and signalfd_pollwake_cb (p_lock before pc_lock), + * this is relegated to a taskq to avoid deadlock. + */ + static void + signalfd_wake_task(void *arg) + { + sigfd_poll_waiter_t *pw = arg; + signalfd_state_t *state = pw->spw_state; + + pw->spw_state = NULL; + signalfd_state_release(state, B_FALSE); + pollwakeup(&pw->spw_pollhd, pw->spw_pollev); + pollhead_clean(&pw->spw_pollhd); + kmem_free(pw, sizeof (*pw)); + } + + /* * Called every time a signal is delivered to the process so that we can * see if any signal stream needs a pollwakeup. We maintain a list of * signal state elements so that we don't have to look at every file descriptor * on the process. If necessary, a further optimization would be to maintain a * signal set mask that is a union of all of the sets in the list so that
*** 252,320 **** */ static void signalfd_pollwake_cb(void *arg0, int sig) { proc_t *p = (proc_t *)arg0; list_t *lst; ! sigfd_wake_list_t *wlp; ASSERT(MUTEX_HELD(&p->p_lock)); ! if (p->p_sigfd == NULL) ! return; - lst = &((sigfd_proc_state_t *)p->p_sigfd)->sigfd_list; - wlp = list_head(lst); - while (wlp != NULL) { - signalfd_state_t *state = wlp->sigfd_wl_state; - mutex_enter(&state->sfd_lock); ! ! if (sigismember(&state->sfd_set, sig) && ! state->sfd_pollhd.ph_list != NULL) { ! sigfd_wake_list_t *tmp = wlp; ! ! /* remove it from the list */ ! wlp = list_next(lst, wlp); ! signalfd_wake_rm(lst, tmp); ! ! mutex_exit(&state->sfd_lock); ! pollwakeup(&state->sfd_pollhd, POLLRDNORM | POLLIN); } else { mutex_exit(&state->sfd_lock); ! wlp = list_next(lst, wlp); } } } _NOTE(ARGSUSED(1)) static int signalfd_open(dev_t *devp, int flag, int otyp, cred_t *cred_p) { ! signalfd_state_t *state; major_t major = getemajor(*devp); minor_t minor = getminor(*devp); if (minor != SIGNALFDMNRN_SIGNALFD) return (ENXIO); mutex_enter(&signalfd_lock); minor = (minor_t)id_allocff(signalfd_minor); - if (ddi_soft_state_zalloc(signalfd_softstate, minor) != DDI_SUCCESS) { id_free(signalfd_minor, minor); mutex_exit(&signalfd_lock); return (ENODEV); } ! state = ddi_get_soft_state(signalfd_softstate, minor); *devp = makedevice(major, minor); - state->sfd_next = signalfd_state; - signalfd_state = state; - mutex_exit(&signalfd_lock); return (0); } --- 298,375 ---- */ static void signalfd_pollwake_cb(void *arg0, int sig) { proc_t *p = (proc_t *)arg0; + sigfd_proc_state_t *pstate = (sigfd_proc_state_t *)p->p_sigfd; list_t *lst; ! sigfd_poll_waiter_t *pw; ASSERT(MUTEX_HELD(&p->p_lock)); + ASSERT(pstate != NULL); ! lst = &pstate->sigfd_list; ! pw = list_head(lst); ! while (pw != NULL) { ! signalfd_state_t *state = pw->spw_state; ! sigfd_poll_waiter_t *next; mutex_enter(&state->sfd_lock); ! if (!state->sfd_valid) { ! pw->spw_pollev = POLLERR; ! } else if (sigismember(&state->sfd_set, sig)) { ! pw->spw_pollev = POLLRDNORM | POLLIN; } else { mutex_exit(&state->sfd_lock); ! pw = list_next(lst, pw); ! continue; } + mutex_exit(&state->sfd_lock); + + /* + * Pull the sigfd_poll_waiter_t out of the list and dispatch it + * to perform a pollwake. This cannot be done synchronously + * since signalfd_poll and signalfd_pollwake_cb have + * conflicting lock orders which can deadlock. + */ + next = list_next(lst, pw); + list_remove(lst, pw); + taskq_dispatch_ent(signalfd_wakeq, signalfd_wake_task, pw, 0, + &pw->spw_taskent); + pw = next; } } _NOTE(ARGSUSED(1)) static int signalfd_open(dev_t *devp, int flag, int otyp, cred_t *cred_p) { ! signalfd_state_t *state, **sstate; major_t major = getemajor(*devp); minor_t minor = getminor(*devp); if (minor != SIGNALFDMNRN_SIGNALFD) return (ENXIO); mutex_enter(&signalfd_lock); minor = (minor_t)id_allocff(signalfd_minor); if (ddi_soft_state_zalloc(signalfd_softstate, minor) != DDI_SUCCESS) { id_free(signalfd_minor, minor); mutex_exit(&signalfd_lock); return (ENODEV); } ! state = kmem_zalloc(sizeof (*state), KM_SLEEP); ! state->sfd_valid = B_TRUE; ! state->sfd_count = 1; ! list_insert_head(&signalfd_state, (void *)state); ! ! sstate = ddi_get_soft_state(signalfd_softstate, minor); ! *sstate = state; *devp = makedevice(major, minor); mutex_exit(&signalfd_lock); return (0); }
*** 403,412 **** --- 458,470 ---- DTRACE_PROC2(signal__clear, int, ret, ksiginfo_t *, infop); lwp->lwp_cursig = 0; lwp->lwp_extsig = 0; mutex_exit(&p->p_lock); + if (PROC_IS_BRANDED(p) && BROP(p)->b_sigfd_translate) + BROP(p)->b_sigfd_translate(infop); + /* Convert k_siginfo into external, datamodel independent, struct. */ bzero(ssp, sizeof (*ssp)); ssp->ssi_signo = infop->si_signo; ssp->ssi_errno = infop->si_errno; ssp->ssi_code = infop->si_code;
*** 437,457 **** */ _NOTE(ARGSUSED(2)) static int signalfd_read(dev_t dev, uio_t *uio, cred_t *cr) { ! signalfd_state_t *state; minor_t minor = getminor(dev); boolean_t block = B_TRUE; k_sigset_t set; boolean_t got_one = B_FALSE; int res; if (uio->uio_resid < sizeof (signalfd_siginfo_t)) return (EINVAL); ! state = ddi_get_soft_state(signalfd_softstate, minor); if (uio->uio_fmode & (FNDELAY|FNONBLOCK)) block = B_FALSE; mutex_enter(&state->sfd_lock); --- 495,516 ---- */ _NOTE(ARGSUSED(2)) static int signalfd_read(dev_t dev, uio_t *uio, cred_t *cr) { ! signalfd_state_t *state, **sstate; minor_t minor = getminor(dev); boolean_t block = B_TRUE; k_sigset_t set; boolean_t got_one = B_FALSE; int res; if (uio->uio_resid < sizeof (signalfd_siginfo_t)) return (EINVAL); ! sstate = ddi_get_soft_state(signalfd_softstate, minor); ! state = *sstate; if (uio->uio_fmode & (FNDELAY|FNONBLOCK)) block = B_FALSE; mutex_enter(&state->sfd_lock);
*** 460,478 **** if (sigisempty(&set)) return (set_errno(EINVAL)); do { ! res = consume_signal(state->sfd_set, uio, block); ! if (res == 0) ! got_one = B_TRUE; /* ! * After consuming one signal we won't block trying to consume ! * further signals. */ block = B_FALSE; } while (res == 0 && uio->uio_resid >= sizeof (signalfd_siginfo_t)); if (got_one) res = 0; --- 519,548 ---- if (sigisempty(&set)) return (set_errno(EINVAL)); do { ! res = consume_signal(set, uio, block); + if (res == 0) { /* ! * After consuming one signal, do not block while ! * trying to consume more. */ + got_one = B_TRUE; block = B_FALSE; + + /* + * Refresh the matching signal set in case it was + * updated during the wait. + */ + mutex_enter(&state->sfd_lock); + set = state->sfd_set; + mutex_exit(&state->sfd_lock); + if (sigisempty(&set)) + break; + } } while (res == 0 && uio->uio_resid >= sizeof (signalfd_siginfo_t)); if (got_one) res = 0;
*** 497,555 **** _NOTE(ARGSUSED(4)) static int signalfd_poll(dev_t dev, short events, int anyyet, short *reventsp, struct pollhead **phpp) { ! signalfd_state_t *state; minor_t minor = getminor(dev); kthread_t *t = curthread; proc_t *p = ttoproc(t); short revents = 0; ! state = ddi_get_soft_state(signalfd_softstate, minor); mutex_enter(&state->sfd_lock); if (signalfd_sig_pending(p, t, state->sfd_set) != 0) revents |= POLLRDNORM | POLLIN; mutex_exit(&state->sfd_lock); if (!(*reventsp = revents & events) && !anyyet) { ! *phpp = &state->sfd_pollhd; /* * Enable pollwakeup handling. */ ! if (p->p_sigfd == NULL) { ! sigfd_proc_state_t *pstate; ! pstate = kmem_zalloc(sizeof (sigfd_proc_state_t), ! KM_SLEEP); list_create(&pstate->sigfd_list, ! sizeof (sigfd_wake_list_t), ! offsetof(sigfd_wake_list_t, sigfd_wl_lst)); mutex_enter(&p->p_lock); - /* check again now that we're locked */ if (p->p_sigfd == NULL) { p->p_sigfd = pstate; } else { /* someone beat us to it */ list_destroy(&pstate->sigfd_list); ! kmem_free(pstate, sizeof (sigfd_proc_state_t)); } - mutex_exit(&p->p_lock); } ! mutex_enter(&p->p_lock); ! if (((sigfd_proc_state_t *)p->p_sigfd)->sigfd_pollwake_cb == ! NULL) { ! ((sigfd_proc_state_t *)p->p_sigfd)->sigfd_pollwake_cb = ! signalfd_pollwake_cb; ! } ! signalfd_wake_list_add(state); mutex_exit(&p->p_lock); } return (0); } --- 567,623 ---- _NOTE(ARGSUSED(4)) static int signalfd_poll(dev_t dev, short events, int anyyet, short *reventsp, struct pollhead **phpp) { ! signalfd_state_t *state, **sstate; minor_t minor = getminor(dev); kthread_t *t = curthread; proc_t *p = ttoproc(t); short revents = 0; ! sstate = ddi_get_soft_state(signalfd_softstate, minor); ! state = *sstate; mutex_enter(&state->sfd_lock); if (signalfd_sig_pending(p, t, state->sfd_set) != 0) revents |= POLLRDNORM | POLLIN; mutex_exit(&state->sfd_lock); if (!(*reventsp = revents & events) && !anyyet) { ! sigfd_proc_state_t *pstate; ! sigfd_poll_waiter_t *pw; /* * Enable pollwakeup handling. */ ! mutex_enter(&p->p_lock); ! if ((pstate = (sigfd_proc_state_t *)p->p_sigfd) == NULL) { ! mutex_exit(&p->p_lock); ! pstate = kmem_zalloc(sizeof (*pstate), KM_SLEEP); list_create(&pstate->sigfd_list, ! sizeof (sigfd_poll_waiter_t), ! offsetof(sigfd_poll_waiter_t, spw_list)); ! pstate->sigfd_pollwake_cb = signalfd_pollwake_cb; + /* Check again, after blocking for the alloc. */ mutex_enter(&p->p_lock); if (p->p_sigfd == NULL) { p->p_sigfd = pstate; } else { /* someone beat us to it */ list_destroy(&pstate->sigfd_list); ! kmem_free(pstate, sizeof (*pstate)); ! pstate = p->p_sigfd; } } ! pw = signalfd_wake_list_add(pstate, state); ! *phpp = &pw->spw_pollhd; mutex_exit(&p->p_lock); } return (0); }
*** 556,570 **** _NOTE(ARGSUSED(4)) static int signalfd_ioctl(dev_t dev, int cmd, intptr_t arg, int md, cred_t *cr, int *rv) { ! signalfd_state_t *state; minor_t minor = getminor(dev); sigset_t mask; ! state = ddi_get_soft_state(signalfd_softstate, minor); switch (cmd) { case SIGNALFDIOC_MASK: if (ddi_copyin((caddr_t)arg, (caddr_t)&mask, sizeof (sigset_t), md) != 0) --- 624,639 ---- _NOTE(ARGSUSED(4)) static int signalfd_ioctl(dev_t dev, int cmd, intptr_t arg, int md, cred_t *cr, int *rv) { ! signalfd_state_t *state, **sstate; minor_t minor = getminor(dev); sigset_t mask; ! sstate = ddi_get_soft_state(signalfd_softstate, minor); ! state = *sstate; switch (cmd) { case SIGNALFDIOC_MASK: if (ddi_copyin((caddr_t)arg, (caddr_t)&mask, sizeof (sigset_t), md) != 0)
*** 585,621 **** _NOTE(ARGSUSED(1)) static int signalfd_close(dev_t dev, int flag, int otyp, cred_t *cred_p) { ! signalfd_state_t *state, **sp; minor_t minor = getminor(dev); proc_t *p = curproc; ! state = ddi_get_soft_state(signalfd_softstate, minor); ! if (state->sfd_pollhd.ph_list != NULL) { ! pollwakeup(&state->sfd_pollhd, POLLERR); ! pollhead_clean(&state->sfd_pollhd); ! } ! ! /* Make sure our state is removed from our proc's pollwake list. */ mutex_enter(&p->p_lock); ! signalfd_wake_list_rm(p, state); mutex_exit(&p->p_lock); mutex_enter(&signalfd_lock); ! /* Remove our state from our global list. */ ! for (sp = &signalfd_state; *sp != state; sp = &((*sp)->sfd_next)) ! VERIFY(*sp != NULL); ! ! *sp = (*sp)->sfd_next; ! ddi_soft_state_free(signalfd_softstate, minor); id_free(signalfd_minor, minor); mutex_exit(&signalfd_lock); return (0); } --- 654,697 ---- _NOTE(ARGSUSED(1)) static int signalfd_close(dev_t dev, int flag, int otyp, cred_t *cred_p) { ! signalfd_state_t *state, **sstate; ! sigfd_poll_waiter_t *pw = NULL; minor_t minor = getminor(dev); proc_t *p = curproc; ! sstate = ddi_get_soft_state(signalfd_softstate, minor); ! state = *sstate; ! /* Make sure state is removed from this proc's pollwake list. */ mutex_enter(&p->p_lock); ! if (p->p_sigfd != NULL) { ! sigfd_proc_state_t *pstate = p->p_sigfd; ! ! pw = signalfd_wake_list_rm(pstate, state); ! if (list_is_empty(&pstate->sigfd_list)) { ! signalfd_wake_list_cleanup(p); ! } ! } mutex_exit(&p->p_lock); + if (pw != NULL) { + pollwakeup(&pw->spw_pollhd, POLLERR); + pollhead_clean(&pw->spw_pollhd); + kmem_free(pw, sizeof (*pw)); + } + mutex_enter(&signalfd_lock); ! *sstate = NULL; ddi_soft_state_free(signalfd_softstate, minor); id_free(signalfd_minor, minor); + signalfd_state_release(state, B_TRUE); + mutex_exit(&signalfd_lock); return (0); }
*** 633,643 **** mutex_exit(&signalfd_lock); return (DDI_FAILURE); } if (ddi_soft_state_init(&signalfd_softstate, ! sizeof (signalfd_state_t), 0) != 0) { cmn_err(CE_WARN, "signalfd failed to create soft state"); id_space_destroy(signalfd_minor); mutex_exit(&signalfd_lock); return (DDI_FAILURE); } --- 709,719 ---- mutex_exit(&signalfd_lock); return (DDI_FAILURE); } if (ddi_soft_state_init(&signalfd_softstate, ! sizeof (signalfd_state_t *), 0) != 0) { cmn_err(CE_WARN, "signalfd failed to create soft state"); id_space_destroy(signalfd_minor); mutex_exit(&signalfd_lock); return (DDI_FAILURE); }
*** 654,663 **** --- 730,745 ---- ddi_report_dev(devi); signalfd_devi = devi; sigfd_exit_helper = signalfd_exit_helper; + list_create(&signalfd_state, sizeof (signalfd_state_t), + offsetof(signalfd_state_t, sfd_list)); + + signalfd_wakeq = taskq_create("signalfd_wake", 1, minclsyspri, + 0, INT_MAX, TASKQ_PREPOPULATE); + mutex_exit(&signalfd_lock); return (DDI_SUCCESS); }
*** 671,684 **** default: return (DDI_FAILURE); } - /* list should be empty */ - VERIFY(signalfd_state == NULL); - mutex_enter(&signalfd_lock); id_space_destroy(signalfd_minor); ddi_remove_minor_node(signalfd_devi, NULL); signalfd_devi = NULL; sigfd_exit_helper = NULL; --- 753,781 ---- default: return (DDI_FAILURE); } mutex_enter(&signalfd_lock); + + if (!list_is_empty(&signalfd_state)) { + /* + * There are dangling poll waiters holding signalfd_state_t + * entries on the global list. Detach is not possible until + * they purge themselves. + */ + mutex_exit(&signalfd_lock); + return (DDI_FAILURE); + } + list_destroy(&signalfd_state); + + /* + * With no remaining entries in the signalfd_state list, the wake taskq + * should be empty with no possibility for new entries. + */ + taskq_destroy(signalfd_wakeq); + id_space_destroy(signalfd_minor); ddi_remove_minor_node(signalfd_devi, NULL); signalfd_devi = NULL; sigfd_exit_helper = NULL;