Print this page
OS-5464 signalfd deadlock on pollwakeup
Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Reviewed by: Robert Mustacchi <rm@joyent.com>
OS-5370 panic in signalfd
Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Reviewed by: Robert Mustacchi <rm@joyent.com>
OS-3742 lxbrand add support for signalfd
OS-4382 remove obsolete brand hooks added during lx development
*** 8,117 ****
* source. A copy of the CDDL is also available via the Internet at
* http://www.illumos.org/license/CDDL.
*/
/*
! * Copyright 2015 Joyent, Inc.
*/
/*
* Support for the signalfd facility, a Linux-borne facility for
* file descriptor-based synchronous signal consumption.
*
* As described on the signalfd(3C) man page, the general idea behind these
* file descriptors is that they can be used to synchronously consume signals
! * via the read(2) syscall. That capability already exists with the
! * sigwaitinfo(3C) function but the key advantage of signalfd is that, because
! * it is file descriptor based, poll(2) can be used to determine when signals
! * are available to be consumed.
*
! * The general implementation uses signalfd_state to hold both the signal set
! * and poll head for an open file descriptor. Because a process can be using
! * different sigfds with different signal sets, each signalfd_state poll head
! * can be thought of as an independent signal stream and the thread(s) waiting
! * on that stream will get poll notification when any signal in the
! * corresponding set is received.
*
! * The sigfd_proc_state_t struct lives on the proc_t and maintains per-proc
! * state for function callbacks and data when the proc needs to do work during
! * signal delivery for pollwakeup.
*
! * The read side of the implementation is straightforward and mimics the
! * kernel behavior for sigtimedwait(). Signals continue to live on either
! * the proc's p_sig, or thread's t_sig, member. Read consumes the signal so
! * that it is no longer pending.
*
! * The poll side is more complex since all of the sigfds on the process need
! * to be examined every time a signal is delivered to the process in order to
! * pollwake any thread waiting in poll for that signal.
*
! * Because it is likely that a process will only be using one, or a few, sigfds,
! * but many total file descriptors, we maintain a list of sigfds which need
! * pollwakeup. The list lives on the proc's p_sigfd struct. In this way only
! * zero, or a few, of the state structs will need to be examined every time a
! * signal is delivered to the process, instead of having to examine all of the
! * file descriptors to find the state structs. When a state struct with a
! * matching signal set is found then pollwakeup is called.
*
! * The sigfd_list is self-cleaning; as signalfd_pollwake_cb is called, the list
! * will clear out on its own. There is an exit helper (signalfd_exit_helper)
! * which cleans up any remaining per-proc state when the process exits.
*
! * The main complexity with signalfd is the interaction of forking and polling.
! * This interaction is complex because now two processes have a fd that
! * references the same dev_t (and its associated signalfd_state), but signals
! * go to only one of those processes. Also, we don't know when one of the
! * processes closes its fd because our 'close' entry point is only called when
! * the last fd is closed (which could be by either process).
*
! * Because the state struct is referenced by both file descriptors, and the
! * state struct represents a signal stream needing a pollwakeup, if both
! * processes were polling then both processes would get a pollwakeup when a
! * signal arrives for either process (that is, the pollhead is associated with
! * our dev_t so when a signal arrives the pollwakeup wakes up all waiters).
*
! * Fortunately this is not a common problem in practice, but the implementation
! * attempts to mitigate unexpected behavior. The typical behavior is that the
! * parent has been polling the signalfd (which is why it was open in the first
! * place) and the parent might have a pending signalfd_state (with the
! * pollhead) on its per-process sigfd_list. After the fork the child will
! * simply close that fd (among others) as part of the typical fork/close/exec
! * pattern. Because the child will never poll that fd, it will never get any
! * state onto its own sigfd_list (the child starts with a null list). The
! * intention is that the child sees no pollwakeup activity for signals unless
! * it explicitly reinvokes poll on the sigfd.
*
! * As background, there are two primary polling cases to consider when the
! * parent process forks:
! * 1) If any thread is blocked in poll(2) then both the parent and child will
! * return from the poll syscall with EINTR. This means that if either
! * process wants to re-poll on a sigfd then it needs to re-run poll and
! * would come back in to the signalfd_poll entry point. The parent would
! * already have the dev_t's state on its sigfd_list and the child would not
! * have anything there unless it called poll again on its fd.
! * 2) If the process is using /dev/poll(7D) then the polling info is being
! * cached by the poll device and the process might not currently be blocked
! * on anything polling related. A subsequent DP_POLL ioctl will not invoke
! * our signalfd_poll entry point again. Because the parent still has its
! * sigfd_list setup, an incoming signal will hit our signalfd_pollwake_cb
! * entry point, which in turn calls pollwake, and /dev/poll will do the
! * right thing on DP_POLL. The child will not have a sigfd_list yet so the
! * signal will not cause a pollwakeup. The dp code does its own handling for
! * cleaning up its cache.
*
! * This leaves only one odd corner case. If the parent and child both use
! * the dup-ed sigfd to poll then when a signal is delivered to either process
! * there is no way to determine which one should get the pollwakeup (since
! * both processes will be queued on the same signal stream poll head). What
! * happens in this case is that both processes will return from poll, but only
! * one of them will actually have a signal to read. The other will return
! * from read with EAGAIN, or block. This case is actually similar to the
! * situation within a single process which got two different sigfd's with the
! * same mask (or poll on two fd's that are dup-ed). Both would return from poll
! * when a signal arrives but only one read would consume the signal and the
! * other read would fail or block. Applications which poll on shared fd's
! * cannot assume that a subsequent read will actually obtain data.
*/
#include <sys/ddi.h>
#include <sys/sunddi.h>
#include <sys/signalfd.h>
--- 8,101 ----
* source. A copy of the CDDL is also available via the Internet at
* http://www.illumos.org/license/CDDL.
*/
/*
! * Copyright 2016 Joyent, Inc.
*/
/*
* Support for the signalfd facility, a Linux-borne facility for
* file descriptor-based synchronous signal consumption.
*
* As described on the signalfd(3C) man page, the general idea behind these
* file descriptors is that they can be used to synchronously consume signals
! * via the read(2) syscall. While that capability already exists with the
! * sigwaitinfo(3C) function, signalfd holds an advantage since it is file
! * descriptor based: It is able use the event facilities (poll(2), /dev/poll,
! * event ports) to notify interested parties when consumable signals arrive.
*
! * The signalfd lifecycle begins When a process opens /dev/signalfd. A minor
! * will be allocated for them along with an associated signalfd_state_t struct.
! * It is there where the mask of desired signals resides.
*
! * Reading from the signalfd is straightforward and mimics the kernel behavior
! * for sigtimedwait(). Signals continue to live on either the proc's p_sig, or
! * thread's t_sig, member. During a read operation, those which match the mask
! * are consumed so they are no longer pending.
*
! * The poll side is more complex. Every time a signal is delivered, all of the
! * signalfds on the process need to be examined in order to pollwake threads
! * waiting for signal arrival.
*
! * When a thread polling on a signalfd requires a pollhead, several steps must
! * be taken to safely ensure the proper result. A sigfd_proc_state_t is
! * created for the calling process if it does not yet exist. It is there where
! * a list of sigfd_poll_waiter_t structures reside which associate pollheads to
! * signalfd_state_t entries. The sigfd_proc_state_t list is walked to find a
! * sigfd_poll_waiter_t matching the signalfd_state_t which corresponds to the
! * polled resource. If one is found, it is reused. Otherwise a new one is
! * created, incrementing the refcount on the signalfd_state_t, and it is added
! * to the sigfd_poll_waiter_t list.
*
! * The complications imposed by fork(2) are why the pollhead is stored in the
! * associated sigfd_poll_waiter_t instead of directly in the signalfd_state_t.
! * More than one process can hold a reference to the signalfd at a time but
! * arriving signals should wake only process-local pollers. Additionally,
! * signalfd_close is called only when the last referencing fd is closed, hiding
! * occurrences of preceeding threads which released their references. This
! * necessitates reference counting on the signalfd_state_t so it is able to
! * persist after close until all poll references have been cleansed. Doing so
! * ensures that blocked pollers which hold references to the signalfd_state_t
! * will be able to do clean-up after the descriptor itself has been closed.
*
! * When a signal arrives in a process polling on signalfd, signalfd_pollwake_cb
! * is called via the pointer in sigfd_proc_state_t. It will walk over the
! * sigfd_poll_waiter_t entries present in the list, searching for any
! * associated with a signalfd_state_t with a matching signal mask. The
! * approach of keeping the poller list in p_sigfd was chosen because a process
! * is likely to use few signalfds relative to its total file descriptors. It
! * reduces the work required for each received signal.
*
! * When matching sigfd_poll_waiter_t entries are encountered in the poller list
! * during signalfd_pollwake_cb, they are dispatched into signalfd_wakeq to
! * perform the pollwake. This is due to a lock ordering conflict between
! * signalfd_poll and signalfd_pollwake_cb. The former acquires
! * pollcache_t`pc_lock before proc_t`p_lock. The latter (via sigtoproc)
! * reverses the order. Defering the pollwake into a taskq means it can be
! * performed without proc_t`p_lock held, avoiding the deadlock.
*
! * The sigfd_list is self-cleaning; as signalfd_pollwake_cb is called, the list
! * will clear out on its own. Any remaining per-process state which remains
! * will be cleaned up by the exit helper (signalfd_exit_helper).
*
! * The structures associated with signalfd state are designed to operate
! * correctly across fork, but there is one caveat that applies. Using
! * fork-shared signalfd descriptors in conjuction with fork-shared caching poll
! * descriptors (such as /dev/poll or event ports) will result in missed poll
! * wake-ups. This is caused by the pollhead identity of signalfd descriptors
! * being dependent on the process they are polled from. Because it has a
! * thread-local cache, poll(2) is unaffected by this limitation.
*
! * Lock ordering:
*
! * 1. signalfd_lock
! * 2. signalfd_state_t`sfd_lock
! *
! * 1. proc_t`p_lock (to walk p_sigfd)
! * 2. signalfd_state_t`sfd_lock
! * 2a. signalfd_lock (after sfd_lock is dropped, when sfd_count falls to 0)
*/
#include <sys/ddi.h>
#include <sys/sunddi.h>
#include <sys/signalfd.h>
*** 121,246 ****
#include <sys/stat.h>
#include <sys/file.h>
#include <sys/schedctl.h>
#include <sys/id_space.h>
#include <sys/sdt.h>
typedef struct signalfd_state signalfd_state_t;
struct signalfd_state {
! kmutex_t sfd_lock; /* lock protecting state */
! pollhead_t sfd_pollhd; /* poll head */
k_sigset_t sfd_set; /* signals for this fd */
- signalfd_state_t *sfd_next; /* next state on global list */
};
/*
! * Internal global variables.
*/
! static kmutex_t signalfd_lock; /* lock protecting state */
static dev_info_t *signalfd_devi; /* device info */
static id_space_t *signalfd_minor; /* minor number arena */
static void *signalfd_softstate; /* softstate pointer */
! static signalfd_state_t *signalfd_state; /* global list of state */
! /*
! * If we don't already have an entry in the proc's list for this state, add one.
! */
static void
! signalfd_wake_list_add(signalfd_state_t *state)
{
! proc_t *p = curproc;
! list_t *lst;
! sigfd_wake_list_t *wlp;
! ASSERT(MUTEX_HELD(&p->p_lock));
! ASSERT(p->p_sigfd != NULL);
! lst = &((sigfd_proc_state_t *)p->p_sigfd)->sigfd_list;
! for (wlp = list_head(lst); wlp != NULL; wlp = list_next(lst, wlp)) {
! if (wlp->sigfd_wl_state == state)
! break;
}
! if (wlp == NULL) {
! wlp = kmem_zalloc(sizeof (sigfd_wake_list_t), KM_SLEEP);
! wlp->sigfd_wl_state = state;
! list_insert_head(lst, wlp);
}
}
! static void
! signalfd_wake_rm(list_t *lst, sigfd_wake_list_t *wlp)
{
! list_remove(lst, wlp);
! kmem_free(wlp, sizeof (sigfd_wake_list_t));
! }
! static void
! signalfd_wake_list_rm(proc_t *p, signalfd_state_t *state)
! {
! sigfd_wake_list_t *wlp;
! list_t *lst;
! ASSERT(MUTEX_HELD(&p->p_lock));
! if (p->p_sigfd == NULL)
! return;
! lst = &((sigfd_proc_state_t *)p->p_sigfd)->sigfd_list;
! for (wlp = list_head(lst); wlp != NULL; wlp = list_next(lst, wlp)) {
! if (wlp->sigfd_wl_state == state) {
! signalfd_wake_rm(lst, wlp);
break;
}
}
! if (list_is_empty(lst)) {
! ((sigfd_proc_state_t *)p->p_sigfd)->sigfd_pollwake_cb = NULL;
! list_destroy(lst);
! kmem_free(p->p_sigfd, sizeof (sigfd_proc_state_t));
! p->p_sigfd = NULL;
}
}
static void
signalfd_wake_list_cleanup(proc_t *p)
{
! sigfd_wake_list_t *wlp;
list_t *lst;
ASSERT(MUTEX_HELD(&p->p_lock));
! ((sigfd_proc_state_t *)p->p_sigfd)->sigfd_pollwake_cb = NULL;
! lst = &((sigfd_proc_state_t *)p->p_sigfd)->sigfd_list;
! while (!list_is_empty(lst)) {
! wlp = (sigfd_wake_list_t *)list_remove_head(lst);
! kmem_free(wlp, sizeof (sigfd_wake_list_t));
}
}
static void
signalfd_exit_helper(void)
{
proc_t *p = curproc;
- list_t *lst;
- /* This being non-null is the only way we can get here */
- ASSERT(p->p_sigfd != NULL);
-
mutex_enter(&p->p_lock);
- lst = &((sigfd_proc_state_t *)p->p_sigfd)->sigfd_list;
-
signalfd_wake_list_cleanup(p);
- list_destroy(lst);
- kmem_free(p->p_sigfd, sizeof (sigfd_proc_state_t));
- p->p_sigfd = NULL;
mutex_exit(&p->p_lock);
}
/*
* Called every time a signal is delivered to the process so that we can
* see if any signal stream needs a pollwakeup. We maintain a list of
* signal state elements so that we don't have to look at every file descriptor
* on the process. If necessary, a further optimization would be to maintain a
* signal set mask that is a union of all of the sets in the list so that
--- 105,292 ----
#include <sys/stat.h>
#include <sys/file.h>
#include <sys/schedctl.h>
#include <sys/id_space.h>
#include <sys/sdt.h>
+ #include <sys/brand.h>
+ #include <sys/disp.h>
+ #include <sys/taskq_impl.h>
typedef struct signalfd_state signalfd_state_t;
struct signalfd_state {
! list_node_t sfd_list; /* node in global list */
! kmutex_t sfd_lock; /* protects fields below */
! uint_t sfd_count; /* ref count */
! boolean_t sfd_valid; /* valid while open */
k_sigset_t sfd_set; /* signals for this fd */
};
+ typedef struct sigfd_poll_waiter {
+ list_node_t spw_list;
+ signalfd_state_t *spw_state;
+ pollhead_t spw_pollhd;
+ taskq_ent_t spw_taskent;
+ short spw_pollev;
+ } sigfd_poll_waiter_t;
+
/*
! * Protects global state in signalfd_devi, signalfd_minor, signalfd_softstate,
! * and signalfd_state (including sfd_list field of members)
*/
! static kmutex_t signalfd_lock;
static dev_info_t *signalfd_devi; /* device info */
static id_space_t *signalfd_minor; /* minor number arena */
static void *signalfd_softstate; /* softstate pointer */
! static list_t signalfd_state; /* global list of state */
! static taskq_t *signalfd_wakeq; /* pollwake event taskq */
!
static void
! signalfd_state_enter_locked(signalfd_state_t *state)
{
! ASSERT(MUTEX_HELD(&state->sfd_lock));
! ASSERT(state->sfd_count > 0);
! VERIFY(state->sfd_valid == B_TRUE);
! state->sfd_count++;
! }
! static void
! signalfd_state_release(signalfd_state_t *state, boolean_t force_invalidate)
! {
! mutex_enter(&state->sfd_lock);
!
! if (force_invalidate) {
! state->sfd_valid = B_FALSE;
}
! ASSERT(state->sfd_count > 0);
! if (state->sfd_count == 1) {
! VERIFY(state->sfd_valid == B_FALSE);
! mutex_exit(&state->sfd_lock);
! if (force_invalidate) {
! /*
! * The invalidation performed in signalfd_close is done
! * while signalfd_lock is held.
! */
! ASSERT(MUTEX_HELD(&signalfd_lock));
! list_remove(&signalfd_state, state);
! } else {
! ASSERT(MUTEX_NOT_HELD(&signalfd_lock));
! mutex_enter(&signalfd_lock);
! list_remove(&signalfd_state, state);
! mutex_exit(&signalfd_lock);
}
+ kmem_free(state, sizeof (*state));
+ return;
+ }
+ state->sfd_count--;
+ mutex_exit(&state->sfd_lock);
}
! static sigfd_poll_waiter_t *
! signalfd_wake_list_add(sigfd_proc_state_t *pstate, signalfd_state_t *state)
{
! list_t *lst = &pstate->sigfd_list;
! sigfd_poll_waiter_t *pw;
! for (pw = list_head(lst); pw != NULL; pw = list_next(lst, pw)) {
! if (pw->spw_state == state)
! break;
! }
! if (pw == NULL) {
! pw = kmem_zalloc(sizeof (*pw), KM_SLEEP);
! mutex_enter(&state->sfd_lock);
! signalfd_state_enter_locked(state);
! pw->spw_state = state;
! mutex_exit(&state->sfd_lock);
! list_insert_head(lst, pw);
! }
! return (pw);
! }
! static sigfd_poll_waiter_t *
! signalfd_wake_list_rm(sigfd_proc_state_t *pstate, signalfd_state_t *state)
! {
! list_t *lst = &pstate->sigfd_list;
! sigfd_poll_waiter_t *pw;
!
! for (pw = list_head(lst); pw != NULL; pw = list_next(lst, pw)) {
! if (pw->spw_state == state) {
break;
}
}
! if (pw != NULL) {
! list_remove(lst, pw);
! pw->spw_state = NULL;
! signalfd_state_release(state, B_FALSE);
}
+
+ return (pw);
}
static void
signalfd_wake_list_cleanup(proc_t *p)
{
! sigfd_proc_state_t *pstate = p->p_sigfd;
! sigfd_poll_waiter_t *pw;
list_t *lst;
ASSERT(MUTEX_HELD(&p->p_lock));
+ ASSERT(pstate != NULL);
! lst = &pstate->sigfd_list;
! while ((pw = list_remove_head(lst)) != NULL) {
! signalfd_state_t *state = pw->spw_state;
! pw->spw_state = NULL;
! signalfd_state_release(state, B_FALSE);
!
! pollwakeup(&pw->spw_pollhd, POLLERR);
! pollhead_clean(&pw->spw_pollhd);
! kmem_free(pw, sizeof (*pw));
}
+ list_destroy(lst);
+
+ p->p_sigfd = NULL;
+ kmem_free(pstate, sizeof (*pstate));
}
static void
signalfd_exit_helper(void)
{
proc_t *p = curproc;
mutex_enter(&p->p_lock);
signalfd_wake_list_cleanup(p);
mutex_exit(&p->p_lock);
}
/*
+ * Perform pollwake for a sigfd_poll_waiter_t entry.
+ * Thanks to the strict and conflicting lock orders required for signalfd_poll
+ * (pc_lock before p_lock) and signalfd_pollwake_cb (p_lock before pc_lock),
+ * this is relegated to a taskq to avoid deadlock.
+ */
+ static void
+ signalfd_wake_task(void *arg)
+ {
+ sigfd_poll_waiter_t *pw = arg;
+ signalfd_state_t *state = pw->spw_state;
+
+ pw->spw_state = NULL;
+ signalfd_state_release(state, B_FALSE);
+ pollwakeup(&pw->spw_pollhd, pw->spw_pollev);
+ pollhead_clean(&pw->spw_pollhd);
+ kmem_free(pw, sizeof (*pw));
+ }
+
+ /*
* Called every time a signal is delivered to the process so that we can
* see if any signal stream needs a pollwakeup. We maintain a list of
* signal state elements so that we don't have to look at every file descriptor
* on the process. If necessary, a further optimization would be to maintain a
* signal set mask that is a union of all of the sets in the list so that
*** 252,320 ****
*/
static void
signalfd_pollwake_cb(void *arg0, int sig)
{
proc_t *p = (proc_t *)arg0;
list_t *lst;
! sigfd_wake_list_t *wlp;
ASSERT(MUTEX_HELD(&p->p_lock));
! if (p->p_sigfd == NULL)
! return;
- lst = &((sigfd_proc_state_t *)p->p_sigfd)->sigfd_list;
- wlp = list_head(lst);
- while (wlp != NULL) {
- signalfd_state_t *state = wlp->sigfd_wl_state;
-
mutex_enter(&state->sfd_lock);
!
! if (sigismember(&state->sfd_set, sig) &&
! state->sfd_pollhd.ph_list != NULL) {
! sigfd_wake_list_t *tmp = wlp;
!
! /* remove it from the list */
! wlp = list_next(lst, wlp);
! signalfd_wake_rm(lst, tmp);
!
! mutex_exit(&state->sfd_lock);
! pollwakeup(&state->sfd_pollhd, POLLRDNORM | POLLIN);
} else {
mutex_exit(&state->sfd_lock);
! wlp = list_next(lst, wlp);
}
}
}
_NOTE(ARGSUSED(1))
static int
signalfd_open(dev_t *devp, int flag, int otyp, cred_t *cred_p)
{
! signalfd_state_t *state;
major_t major = getemajor(*devp);
minor_t minor = getminor(*devp);
if (minor != SIGNALFDMNRN_SIGNALFD)
return (ENXIO);
mutex_enter(&signalfd_lock);
minor = (minor_t)id_allocff(signalfd_minor);
-
if (ddi_soft_state_zalloc(signalfd_softstate, minor) != DDI_SUCCESS) {
id_free(signalfd_minor, minor);
mutex_exit(&signalfd_lock);
return (ENODEV);
}
! state = ddi_get_soft_state(signalfd_softstate, minor);
*devp = makedevice(major, minor);
- state->sfd_next = signalfd_state;
- signalfd_state = state;
-
mutex_exit(&signalfd_lock);
return (0);
}
--- 298,375 ----
*/
static void
signalfd_pollwake_cb(void *arg0, int sig)
{
proc_t *p = (proc_t *)arg0;
+ sigfd_proc_state_t *pstate = (sigfd_proc_state_t *)p->p_sigfd;
list_t *lst;
! sigfd_poll_waiter_t *pw;
ASSERT(MUTEX_HELD(&p->p_lock));
+ ASSERT(pstate != NULL);
! lst = &pstate->sigfd_list;
! pw = list_head(lst);
! while (pw != NULL) {
! signalfd_state_t *state = pw->spw_state;
! sigfd_poll_waiter_t *next;
mutex_enter(&state->sfd_lock);
! if (!state->sfd_valid) {
! pw->spw_pollev = POLLERR;
! } else if (sigismember(&state->sfd_set, sig)) {
! pw->spw_pollev = POLLRDNORM | POLLIN;
} else {
mutex_exit(&state->sfd_lock);
! pw = list_next(lst, pw);
! continue;
}
+ mutex_exit(&state->sfd_lock);
+
+ /*
+ * Pull the sigfd_poll_waiter_t out of the list and dispatch it
+ * to perform a pollwake. This cannot be done synchronously
+ * since signalfd_poll and signalfd_pollwake_cb have
+ * conflicting lock orders which can deadlock.
+ */
+ next = list_next(lst, pw);
+ list_remove(lst, pw);
+ taskq_dispatch_ent(signalfd_wakeq, signalfd_wake_task, pw, 0,
+ &pw->spw_taskent);
+ pw = next;
}
}
_NOTE(ARGSUSED(1))
static int
signalfd_open(dev_t *devp, int flag, int otyp, cred_t *cred_p)
{
! signalfd_state_t *state, **sstate;
major_t major = getemajor(*devp);
minor_t minor = getminor(*devp);
if (minor != SIGNALFDMNRN_SIGNALFD)
return (ENXIO);
mutex_enter(&signalfd_lock);
minor = (minor_t)id_allocff(signalfd_minor);
if (ddi_soft_state_zalloc(signalfd_softstate, minor) != DDI_SUCCESS) {
id_free(signalfd_minor, minor);
mutex_exit(&signalfd_lock);
return (ENODEV);
}
! state = kmem_zalloc(sizeof (*state), KM_SLEEP);
! state->sfd_valid = B_TRUE;
! state->sfd_count = 1;
! list_insert_head(&signalfd_state, (void *)state);
!
! sstate = ddi_get_soft_state(signalfd_softstate, minor);
! *sstate = state;
*devp = makedevice(major, minor);
mutex_exit(&signalfd_lock);
return (0);
}
*** 403,412 ****
--- 458,470 ----
DTRACE_PROC2(signal__clear, int, ret, ksiginfo_t *, infop);
lwp->lwp_cursig = 0;
lwp->lwp_extsig = 0;
mutex_exit(&p->p_lock);
+ if (PROC_IS_BRANDED(p) && BROP(p)->b_sigfd_translate)
+ BROP(p)->b_sigfd_translate(infop);
+
/* Convert k_siginfo into external, datamodel independent, struct. */
bzero(ssp, sizeof (*ssp));
ssp->ssi_signo = infop->si_signo;
ssp->ssi_errno = infop->si_errno;
ssp->ssi_code = infop->si_code;
*** 437,457 ****
*/
_NOTE(ARGSUSED(2))
static int
signalfd_read(dev_t dev, uio_t *uio, cred_t *cr)
{
! signalfd_state_t *state;
minor_t minor = getminor(dev);
boolean_t block = B_TRUE;
k_sigset_t set;
boolean_t got_one = B_FALSE;
int res;
if (uio->uio_resid < sizeof (signalfd_siginfo_t))
return (EINVAL);
! state = ddi_get_soft_state(signalfd_softstate, minor);
if (uio->uio_fmode & (FNDELAY|FNONBLOCK))
block = B_FALSE;
mutex_enter(&state->sfd_lock);
--- 495,516 ----
*/
_NOTE(ARGSUSED(2))
static int
signalfd_read(dev_t dev, uio_t *uio, cred_t *cr)
{
! signalfd_state_t *state, **sstate;
minor_t minor = getminor(dev);
boolean_t block = B_TRUE;
k_sigset_t set;
boolean_t got_one = B_FALSE;
int res;
if (uio->uio_resid < sizeof (signalfd_siginfo_t))
return (EINVAL);
! sstate = ddi_get_soft_state(signalfd_softstate, minor);
! state = *sstate;
if (uio->uio_fmode & (FNDELAY|FNONBLOCK))
block = B_FALSE;
mutex_enter(&state->sfd_lock);
*** 460,478 ****
if (sigisempty(&set))
return (set_errno(EINVAL));
do {
! res = consume_signal(state->sfd_set, uio, block);
! if (res == 0)
! got_one = B_TRUE;
/*
! * After consuming one signal we won't block trying to consume
! * further signals.
*/
block = B_FALSE;
} while (res == 0 && uio->uio_resid >= sizeof (signalfd_siginfo_t));
if (got_one)
res = 0;
--- 519,548 ----
if (sigisempty(&set))
return (set_errno(EINVAL));
do {
! res = consume_signal(set, uio, block);
+ if (res == 0) {
/*
! * After consuming one signal, do not block while
! * trying to consume more.
*/
+ got_one = B_TRUE;
block = B_FALSE;
+
+ /*
+ * Refresh the matching signal set in case it was
+ * updated during the wait.
+ */
+ mutex_enter(&state->sfd_lock);
+ set = state->sfd_set;
+ mutex_exit(&state->sfd_lock);
+ if (sigisempty(&set))
+ break;
+ }
} while (res == 0 && uio->uio_resid >= sizeof (signalfd_siginfo_t));
if (got_one)
res = 0;
*** 497,555 ****
_NOTE(ARGSUSED(4))
static int
signalfd_poll(dev_t dev, short events, int anyyet, short *reventsp,
struct pollhead **phpp)
{
! signalfd_state_t *state;
minor_t minor = getminor(dev);
kthread_t *t = curthread;
proc_t *p = ttoproc(t);
short revents = 0;
! state = ddi_get_soft_state(signalfd_softstate, minor);
mutex_enter(&state->sfd_lock);
if (signalfd_sig_pending(p, t, state->sfd_set) != 0)
revents |= POLLRDNORM | POLLIN;
mutex_exit(&state->sfd_lock);
if (!(*reventsp = revents & events) && !anyyet) {
! *phpp = &state->sfd_pollhd;
/*
* Enable pollwakeup handling.
*/
! if (p->p_sigfd == NULL) {
! sigfd_proc_state_t *pstate;
! pstate = kmem_zalloc(sizeof (sigfd_proc_state_t),
! KM_SLEEP);
list_create(&pstate->sigfd_list,
! sizeof (sigfd_wake_list_t),
! offsetof(sigfd_wake_list_t, sigfd_wl_lst));
mutex_enter(&p->p_lock);
- /* check again now that we're locked */
if (p->p_sigfd == NULL) {
p->p_sigfd = pstate;
} else {
/* someone beat us to it */
list_destroy(&pstate->sigfd_list);
! kmem_free(pstate, sizeof (sigfd_proc_state_t));
}
- mutex_exit(&p->p_lock);
}
! mutex_enter(&p->p_lock);
! if (((sigfd_proc_state_t *)p->p_sigfd)->sigfd_pollwake_cb ==
! NULL) {
! ((sigfd_proc_state_t *)p->p_sigfd)->sigfd_pollwake_cb =
! signalfd_pollwake_cb;
! }
! signalfd_wake_list_add(state);
mutex_exit(&p->p_lock);
}
return (0);
}
--- 567,623 ----
_NOTE(ARGSUSED(4))
static int
signalfd_poll(dev_t dev, short events, int anyyet, short *reventsp,
struct pollhead **phpp)
{
! signalfd_state_t *state, **sstate;
minor_t minor = getminor(dev);
kthread_t *t = curthread;
proc_t *p = ttoproc(t);
short revents = 0;
! sstate = ddi_get_soft_state(signalfd_softstate, minor);
! state = *sstate;
mutex_enter(&state->sfd_lock);
if (signalfd_sig_pending(p, t, state->sfd_set) != 0)
revents |= POLLRDNORM | POLLIN;
mutex_exit(&state->sfd_lock);
if (!(*reventsp = revents & events) && !anyyet) {
! sigfd_proc_state_t *pstate;
! sigfd_poll_waiter_t *pw;
/*
* Enable pollwakeup handling.
*/
! mutex_enter(&p->p_lock);
! if ((pstate = (sigfd_proc_state_t *)p->p_sigfd) == NULL) {
! mutex_exit(&p->p_lock);
! pstate = kmem_zalloc(sizeof (*pstate), KM_SLEEP);
list_create(&pstate->sigfd_list,
! sizeof (sigfd_poll_waiter_t),
! offsetof(sigfd_poll_waiter_t, spw_list));
! pstate->sigfd_pollwake_cb = signalfd_pollwake_cb;
+ /* Check again, after blocking for the alloc. */
mutex_enter(&p->p_lock);
if (p->p_sigfd == NULL) {
p->p_sigfd = pstate;
} else {
/* someone beat us to it */
list_destroy(&pstate->sigfd_list);
! kmem_free(pstate, sizeof (*pstate));
! pstate = p->p_sigfd;
}
}
! pw = signalfd_wake_list_add(pstate, state);
! *phpp = &pw->spw_pollhd;
mutex_exit(&p->p_lock);
}
return (0);
}
*** 556,570 ****
_NOTE(ARGSUSED(4))
static int
signalfd_ioctl(dev_t dev, int cmd, intptr_t arg, int md, cred_t *cr, int *rv)
{
! signalfd_state_t *state;
minor_t minor = getminor(dev);
sigset_t mask;
! state = ddi_get_soft_state(signalfd_softstate, minor);
switch (cmd) {
case SIGNALFDIOC_MASK:
if (ddi_copyin((caddr_t)arg, (caddr_t)&mask, sizeof (sigset_t),
md) != 0)
--- 624,639 ----
_NOTE(ARGSUSED(4))
static int
signalfd_ioctl(dev_t dev, int cmd, intptr_t arg, int md, cred_t *cr, int *rv)
{
! signalfd_state_t *state, **sstate;
minor_t minor = getminor(dev);
sigset_t mask;
! sstate = ddi_get_soft_state(signalfd_softstate, minor);
! state = *sstate;
switch (cmd) {
case SIGNALFDIOC_MASK:
if (ddi_copyin((caddr_t)arg, (caddr_t)&mask, sizeof (sigset_t),
md) != 0)
*** 585,621 ****
_NOTE(ARGSUSED(1))
static int
signalfd_close(dev_t dev, int flag, int otyp, cred_t *cred_p)
{
! signalfd_state_t *state, **sp;
minor_t minor = getminor(dev);
proc_t *p = curproc;
! state = ddi_get_soft_state(signalfd_softstate, minor);
! if (state->sfd_pollhd.ph_list != NULL) {
! pollwakeup(&state->sfd_pollhd, POLLERR);
! pollhead_clean(&state->sfd_pollhd);
! }
!
! /* Make sure our state is removed from our proc's pollwake list. */
mutex_enter(&p->p_lock);
! signalfd_wake_list_rm(p, state);
mutex_exit(&p->p_lock);
mutex_enter(&signalfd_lock);
! /* Remove our state from our global list. */
! for (sp = &signalfd_state; *sp != state; sp = &((*sp)->sfd_next))
! VERIFY(*sp != NULL);
!
! *sp = (*sp)->sfd_next;
!
ddi_soft_state_free(signalfd_softstate, minor);
id_free(signalfd_minor, minor);
mutex_exit(&signalfd_lock);
return (0);
}
--- 654,697 ----
_NOTE(ARGSUSED(1))
static int
signalfd_close(dev_t dev, int flag, int otyp, cred_t *cred_p)
{
! signalfd_state_t *state, **sstate;
! sigfd_poll_waiter_t *pw = NULL;
minor_t minor = getminor(dev);
proc_t *p = curproc;
! sstate = ddi_get_soft_state(signalfd_softstate, minor);
! state = *sstate;
! /* Make sure state is removed from this proc's pollwake list. */
mutex_enter(&p->p_lock);
! if (p->p_sigfd != NULL) {
! sigfd_proc_state_t *pstate = p->p_sigfd;
!
! pw = signalfd_wake_list_rm(pstate, state);
! if (list_is_empty(&pstate->sigfd_list)) {
! signalfd_wake_list_cleanup(p);
! }
! }
mutex_exit(&p->p_lock);
+ if (pw != NULL) {
+ pollwakeup(&pw->spw_pollhd, POLLERR);
+ pollhead_clean(&pw->spw_pollhd);
+ kmem_free(pw, sizeof (*pw));
+ }
+
mutex_enter(&signalfd_lock);
! *sstate = NULL;
ddi_soft_state_free(signalfd_softstate, minor);
id_free(signalfd_minor, minor);
+ signalfd_state_release(state, B_TRUE);
+
mutex_exit(&signalfd_lock);
return (0);
}
*** 633,643 ****
mutex_exit(&signalfd_lock);
return (DDI_FAILURE);
}
if (ddi_soft_state_init(&signalfd_softstate,
! sizeof (signalfd_state_t), 0) != 0) {
cmn_err(CE_WARN, "signalfd failed to create soft state");
id_space_destroy(signalfd_minor);
mutex_exit(&signalfd_lock);
return (DDI_FAILURE);
}
--- 709,719 ----
mutex_exit(&signalfd_lock);
return (DDI_FAILURE);
}
if (ddi_soft_state_init(&signalfd_softstate,
! sizeof (signalfd_state_t *), 0) != 0) {
cmn_err(CE_WARN, "signalfd failed to create soft state");
id_space_destroy(signalfd_minor);
mutex_exit(&signalfd_lock);
return (DDI_FAILURE);
}
*** 654,663 ****
--- 730,745 ----
ddi_report_dev(devi);
signalfd_devi = devi;
sigfd_exit_helper = signalfd_exit_helper;
+ list_create(&signalfd_state, sizeof (signalfd_state_t),
+ offsetof(signalfd_state_t, sfd_list));
+
+ signalfd_wakeq = taskq_create("signalfd_wake", 1, minclsyspri,
+ 0, INT_MAX, TASKQ_PREPOPULATE);
+
mutex_exit(&signalfd_lock);
return (DDI_SUCCESS);
}
*** 671,684 ****
default:
return (DDI_FAILURE);
}
- /* list should be empty */
- VERIFY(signalfd_state == NULL);
-
mutex_enter(&signalfd_lock);
id_space_destroy(signalfd_minor);
ddi_remove_minor_node(signalfd_devi, NULL);
signalfd_devi = NULL;
sigfd_exit_helper = NULL;
--- 753,781 ----
default:
return (DDI_FAILURE);
}
mutex_enter(&signalfd_lock);
+
+ if (!list_is_empty(&signalfd_state)) {
+ /*
+ * There are dangling poll waiters holding signalfd_state_t
+ * entries on the global list. Detach is not possible until
+ * they purge themselves.
+ */
+ mutex_exit(&signalfd_lock);
+ return (DDI_FAILURE);
+ }
+ list_destroy(&signalfd_state);
+
+ /*
+ * With no remaining entries in the signalfd_state list, the wake taskq
+ * should be empty with no possibility for new entries.
+ */
+ taskq_destroy(signalfd_wakeq);
+
id_space_destroy(signalfd_minor);
ddi_remove_minor_node(signalfd_devi, NULL);
signalfd_devi = NULL;
sigfd_exit_helper = NULL;