Print this page
OS-5464 signalfd deadlock on pollwakeup
Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Reviewed by: Robert Mustacchi <rm@joyent.com>
OS-5370 panic in signalfd
Reviewed by: Jerry Jelinek <jerry.jelinek@joyent.com>
Reviewed by: Robert Mustacchi <rm@joyent.com>
OS-3742 lxbrand add support for signalfd
OS-4382 remove obsolete brand hooks added during lx development
| Split |
Close |
| Expand all |
| Collapse all |
--- old/usr/src/uts/common/io/signalfd.c
+++ new/usr/src/uts/common/io/signalfd.c
1 1 /*
2 2 * This file and its contents are supplied under the terms of the
|
↓ open down ↓ |
2 lines elided |
↑ open up ↑ |
3 3 * Common Development and Distribution License ("CDDL"), version 1.0.
4 4 * You may only use this file in accordance with the terms of version
5 5 * 1.0 of the CDDL.
6 6 *
7 7 * A full copy of the text of the CDDL should have accompanied this
8 8 * source. A copy of the CDDL is also available via the Internet at
9 9 * http://www.illumos.org/license/CDDL.
10 10 */
11 11
12 12 /*
13 - * Copyright 2015 Joyent, Inc.
13 + * Copyright 2016 Joyent, Inc.
14 14 */
15 15
16 16 /*
17 17 * Support for the signalfd facility, a Linux-borne facility for
18 18 * file descriptor-based synchronous signal consumption.
19 19 *
20 20 * As described on the signalfd(3C) man page, the general idea behind these
21 21 * file descriptors is that they can be used to synchronously consume signals
22 - * via the read(2) syscall. That capability already exists with the
23 - * sigwaitinfo(3C) function but the key advantage of signalfd is that, because
24 - * it is file descriptor based, poll(2) can be used to determine when signals
25 - * are available to be consumed.
22 + * via the read(2) syscall. While that capability already exists with the
23 + * sigwaitinfo(3C) function, signalfd holds an advantage since it is file
24 + * descriptor based: It is able use the event facilities (poll(2), /dev/poll,
25 + * event ports) to notify interested parties when consumable signals arrive.
26 26 *
27 - * The general implementation uses signalfd_state to hold both the signal set
28 - * and poll head for an open file descriptor. Because a process can be using
29 - * different sigfds with different signal sets, each signalfd_state poll head
30 - * can be thought of as an independent signal stream and the thread(s) waiting
31 - * on that stream will get poll notification when any signal in the
32 - * corresponding set is received.
27 + * The signalfd lifecycle begins When a process opens /dev/signalfd. A minor
28 + * will be allocated for them along with an associated signalfd_state_t struct.
29 + * It is there where the mask of desired signals resides.
33 30 *
34 - * The sigfd_proc_state_t struct lives on the proc_t and maintains per-proc
35 - * state for function callbacks and data when the proc needs to do work during
36 - * signal delivery for pollwakeup.
31 + * Reading from the signalfd is straightforward and mimics the kernel behavior
32 + * for sigtimedwait(). Signals continue to live on either the proc's p_sig, or
33 + * thread's t_sig, member. During a read operation, those which match the mask
34 + * are consumed so they are no longer pending.
37 35 *
38 - * The read side of the implementation is straightforward and mimics the
39 - * kernel behavior for sigtimedwait(). Signals continue to live on either
40 - * the proc's p_sig, or thread's t_sig, member. Read consumes the signal so
41 - * that it is no longer pending.
36 + * The poll side is more complex. Every time a signal is delivered, all of the
37 + * signalfds on the process need to be examined in order to pollwake threads
38 + * waiting for signal arrival.
42 39 *
43 - * The poll side is more complex since all of the sigfds on the process need
44 - * to be examined every time a signal is delivered to the process in order to
45 - * pollwake any thread waiting in poll for that signal.
40 + * When a thread polling on a signalfd requires a pollhead, several steps must
41 + * be taken to safely ensure the proper result. A sigfd_proc_state_t is
42 + * created for the calling process if it does not yet exist. It is there where
43 + * a list of sigfd_poll_waiter_t structures reside which associate pollheads to
44 + * signalfd_state_t entries. The sigfd_proc_state_t list is walked to find a
45 + * sigfd_poll_waiter_t matching the signalfd_state_t which corresponds to the
46 + * polled resource. If one is found, it is reused. Otherwise a new one is
47 + * created, incrementing the refcount on the signalfd_state_t, and it is added
48 + * to the sigfd_poll_waiter_t list.
46 49 *
47 - * Because it is likely that a process will only be using one, or a few, sigfds,
48 - * but many total file descriptors, we maintain a list of sigfds which need
49 - * pollwakeup. The list lives on the proc's p_sigfd struct. In this way only
50 - * zero, or a few, of the state structs will need to be examined every time a
51 - * signal is delivered to the process, instead of having to examine all of the
52 - * file descriptors to find the state structs. When a state struct with a
53 - * matching signal set is found then pollwakeup is called.
50 + * The complications imposed by fork(2) are why the pollhead is stored in the
51 + * associated sigfd_poll_waiter_t instead of directly in the signalfd_state_t.
52 + * More than one process can hold a reference to the signalfd at a time but
53 + * arriving signals should wake only process-local pollers. Additionally,
54 + * signalfd_close is called only when the last referencing fd is closed, hiding
55 + * occurrences of preceeding threads which released their references. This
56 + * necessitates reference counting on the signalfd_state_t so it is able to
57 + * persist after close until all poll references have been cleansed. Doing so
58 + * ensures that blocked pollers which hold references to the signalfd_state_t
59 + * will be able to do clean-up after the descriptor itself has been closed.
54 60 *
55 - * The sigfd_list is self-cleaning; as signalfd_pollwake_cb is called, the list
56 - * will clear out on its own. There is an exit helper (signalfd_exit_helper)
57 - * which cleans up any remaining per-proc state when the process exits.
61 + * When a signal arrives in a process polling on signalfd, signalfd_pollwake_cb
62 + * is called via the pointer in sigfd_proc_state_t. It will walk over the
63 + * sigfd_poll_waiter_t entries present in the list, searching for any
64 + * associated with a signalfd_state_t with a matching signal mask. The
65 + * approach of keeping the poller list in p_sigfd was chosen because a process
66 + * is likely to use few signalfds relative to its total file descriptors. It
67 + * reduces the work required for each received signal.
58 68 *
59 - * The main complexity with signalfd is the interaction of forking and polling.
60 - * This interaction is complex because now two processes have a fd that
61 - * references the same dev_t (and its associated signalfd_state), but signals
62 - * go to only one of those processes. Also, we don't know when one of the
63 - * processes closes its fd because our 'close' entry point is only called when
64 - * the last fd is closed (which could be by either process).
69 + * When matching sigfd_poll_waiter_t entries are encountered in the poller list
70 + * during signalfd_pollwake_cb, they are dispatched into signalfd_wakeq to
71 + * perform the pollwake. This is due to a lock ordering conflict between
72 + * signalfd_poll and signalfd_pollwake_cb. The former acquires
73 + * pollcache_t`pc_lock before proc_t`p_lock. The latter (via sigtoproc)
74 + * reverses the order. Defering the pollwake into a taskq means it can be
75 + * performed without proc_t`p_lock held, avoiding the deadlock.
65 76 *
66 - * Because the state struct is referenced by both file descriptors, and the
67 - * state struct represents a signal stream needing a pollwakeup, if both
68 - * processes were polling then both processes would get a pollwakeup when a
69 - * signal arrives for either process (that is, the pollhead is associated with
70 - * our dev_t so when a signal arrives the pollwakeup wakes up all waiters).
77 + * The sigfd_list is self-cleaning; as signalfd_pollwake_cb is called, the list
78 + * will clear out on its own. Any remaining per-process state which remains
79 + * will be cleaned up by the exit helper (signalfd_exit_helper).
71 80 *
72 - * Fortunately this is not a common problem in practice, but the implementation
73 - * attempts to mitigate unexpected behavior. The typical behavior is that the
74 - * parent has been polling the signalfd (which is why it was open in the first
75 - * place) and the parent might have a pending signalfd_state (with the
76 - * pollhead) on its per-process sigfd_list. After the fork the child will
77 - * simply close that fd (among others) as part of the typical fork/close/exec
78 - * pattern. Because the child will never poll that fd, it will never get any
79 - * state onto its own sigfd_list (the child starts with a null list). The
80 - * intention is that the child sees no pollwakeup activity for signals unless
81 - * it explicitly reinvokes poll on the sigfd.
81 + * The structures associated with signalfd state are designed to operate
82 + * correctly across fork, but there is one caveat that applies. Using
83 + * fork-shared signalfd descriptors in conjuction with fork-shared caching poll
84 + * descriptors (such as /dev/poll or event ports) will result in missed poll
85 + * wake-ups. This is caused by the pollhead identity of signalfd descriptors
86 + * being dependent on the process they are polled from. Because it has a
87 + * thread-local cache, poll(2) is unaffected by this limitation.
82 88 *
83 - * As background, there are two primary polling cases to consider when the
84 - * parent process forks:
85 - * 1) If any thread is blocked in poll(2) then both the parent and child will
86 - * return from the poll syscall with EINTR. This means that if either
87 - * process wants to re-poll on a sigfd then it needs to re-run poll and
88 - * would come back in to the signalfd_poll entry point. The parent would
89 - * already have the dev_t's state on its sigfd_list and the child would not
90 - * have anything there unless it called poll again on its fd.
91 - * 2) If the process is using /dev/poll(7D) then the polling info is being
92 - * cached by the poll device and the process might not currently be blocked
93 - * on anything polling related. A subsequent DP_POLL ioctl will not invoke
94 - * our signalfd_poll entry point again. Because the parent still has its
95 - * sigfd_list setup, an incoming signal will hit our signalfd_pollwake_cb
96 - * entry point, which in turn calls pollwake, and /dev/poll will do the
97 - * right thing on DP_POLL. The child will not have a sigfd_list yet so the
98 - * signal will not cause a pollwakeup. The dp code does its own handling for
99 - * cleaning up its cache.
89 + * Lock ordering:
100 90 *
101 - * This leaves only one odd corner case. If the parent and child both use
102 - * the dup-ed sigfd to poll then when a signal is delivered to either process
103 - * there is no way to determine which one should get the pollwakeup (since
104 - * both processes will be queued on the same signal stream poll head). What
105 - * happens in this case is that both processes will return from poll, but only
106 - * one of them will actually have a signal to read. The other will return
107 - * from read with EAGAIN, or block. This case is actually similar to the
108 - * situation within a single process which got two different sigfd's with the
109 - * same mask (or poll on two fd's that are dup-ed). Both would return from poll
110 - * when a signal arrives but only one read would consume the signal and the
111 - * other read would fail or block. Applications which poll on shared fd's
112 - * cannot assume that a subsequent read will actually obtain data.
91 + * 1. signalfd_lock
92 + * 2. signalfd_state_t`sfd_lock
93 + *
94 + * 1. proc_t`p_lock (to walk p_sigfd)
95 + * 2. signalfd_state_t`sfd_lock
96 + * 2a. signalfd_lock (after sfd_lock is dropped, when sfd_count falls to 0)
113 97 */
114 98
115 99 #include <sys/ddi.h>
116 100 #include <sys/sunddi.h>
117 101 #include <sys/signalfd.h>
118 102 #include <sys/conf.h>
119 103 #include <sys/sysmacros.h>
120 104 #include <sys/filio.h>
121 105 #include <sys/stat.h>
122 106 #include <sys/file.h>
123 107 #include <sys/schedctl.h>
124 108 #include <sys/id_space.h>
125 109 #include <sys/sdt.h>
110 +#include <sys/brand.h>
111 +#include <sys/disp.h>
112 +#include <sys/taskq_impl.h>
126 113
127 114 typedef struct signalfd_state signalfd_state_t;
128 115
129 116 struct signalfd_state {
130 - kmutex_t sfd_lock; /* lock protecting state */
131 - pollhead_t sfd_pollhd; /* poll head */
132 - k_sigset_t sfd_set; /* signals for this fd */
133 - signalfd_state_t *sfd_next; /* next state on global list */
117 + list_node_t sfd_list; /* node in global list */
118 + kmutex_t sfd_lock; /* protects fields below */
119 + uint_t sfd_count; /* ref count */
120 + boolean_t sfd_valid; /* valid while open */
121 + k_sigset_t sfd_set; /* signals for this fd */
134 122 };
135 123
124 +typedef struct sigfd_poll_waiter {
125 + list_node_t spw_list;
126 + signalfd_state_t *spw_state;
127 + pollhead_t spw_pollhd;
128 + taskq_ent_t spw_taskent;
129 + short spw_pollev;
130 +} sigfd_poll_waiter_t;
131 +
136 132 /*
137 - * Internal global variables.
133 + * Protects global state in signalfd_devi, signalfd_minor, signalfd_softstate,
134 + * and signalfd_state (including sfd_list field of members)
138 135 */
139 -static kmutex_t signalfd_lock; /* lock protecting state */
136 +static kmutex_t signalfd_lock;
140 137 static dev_info_t *signalfd_devi; /* device info */
141 138 static id_space_t *signalfd_minor; /* minor number arena */
142 139 static void *signalfd_softstate; /* softstate pointer */
143 -static signalfd_state_t *signalfd_state; /* global list of state */
140 +static list_t signalfd_state; /* global list of state */
141 +static taskq_t *signalfd_wakeq; /* pollwake event taskq */
144 142
145 -/*
146 - * If we don't already have an entry in the proc's list for this state, add one.
147 - */
143 +
148 144 static void
149 -signalfd_wake_list_add(signalfd_state_t *state)
145 +signalfd_state_enter_locked(signalfd_state_t *state)
150 146 {
151 - proc_t *p = curproc;
152 - list_t *lst;
153 - sigfd_wake_list_t *wlp;
147 + ASSERT(MUTEX_HELD(&state->sfd_lock));
148 + ASSERT(state->sfd_count > 0);
149 + VERIFY(state->sfd_valid == B_TRUE);
154 150
155 - ASSERT(MUTEX_HELD(&p->p_lock));
156 - ASSERT(p->p_sigfd != NULL);
151 + state->sfd_count++;
152 +}
157 153
158 - lst = &((sigfd_proc_state_t *)p->p_sigfd)->sigfd_list;
159 - for (wlp = list_head(lst); wlp != NULL; wlp = list_next(lst, wlp)) {
160 - if (wlp->sigfd_wl_state == state)
161 - break;
154 +static void
155 +signalfd_state_release(signalfd_state_t *state, boolean_t force_invalidate)
156 +{
157 + mutex_enter(&state->sfd_lock);
158 +
159 + if (force_invalidate) {
160 + state->sfd_valid = B_FALSE;
162 161 }
163 162
164 - if (wlp == NULL) {
165 - wlp = kmem_zalloc(sizeof (sigfd_wake_list_t), KM_SLEEP);
166 - wlp->sigfd_wl_state = state;
167 - list_insert_head(lst, wlp);
163 + ASSERT(state->sfd_count > 0);
164 + if (state->sfd_count == 1) {
165 + VERIFY(state->sfd_valid == B_FALSE);
166 + mutex_exit(&state->sfd_lock);
167 + if (force_invalidate) {
168 + /*
169 + * The invalidation performed in signalfd_close is done
170 + * while signalfd_lock is held.
171 + */
172 + ASSERT(MUTEX_HELD(&signalfd_lock));
173 + list_remove(&signalfd_state, state);
174 + } else {
175 + ASSERT(MUTEX_NOT_HELD(&signalfd_lock));
176 + mutex_enter(&signalfd_lock);
177 + list_remove(&signalfd_state, state);
178 + mutex_exit(&signalfd_lock);
179 + }
180 + kmem_free(state, sizeof (*state));
181 + return;
168 182 }
183 + state->sfd_count--;
184 + mutex_exit(&state->sfd_lock);
169 185 }
170 186
171 -static void
172 -signalfd_wake_rm(list_t *lst, sigfd_wake_list_t *wlp)
187 +static sigfd_poll_waiter_t *
188 +signalfd_wake_list_add(sigfd_proc_state_t *pstate, signalfd_state_t *state)
173 189 {
174 - list_remove(lst, wlp);
175 - kmem_free(wlp, sizeof (sigfd_wake_list_t));
176 -}
190 + list_t *lst = &pstate->sigfd_list;
191 + sigfd_poll_waiter_t *pw;
177 192
178 -static void
179 -signalfd_wake_list_rm(proc_t *p, signalfd_state_t *state)
180 -{
181 - sigfd_wake_list_t *wlp;
182 - list_t *lst;
193 + for (pw = list_head(lst); pw != NULL; pw = list_next(lst, pw)) {
194 + if (pw->spw_state == state)
195 + break;
196 + }
183 197
184 - ASSERT(MUTEX_HELD(&p->p_lock));
198 + if (pw == NULL) {
199 + pw = kmem_zalloc(sizeof (*pw), KM_SLEEP);
185 200
186 - if (p->p_sigfd == NULL)
187 - return;
201 + mutex_enter(&state->sfd_lock);
202 + signalfd_state_enter_locked(state);
203 + pw->spw_state = state;
204 + mutex_exit(&state->sfd_lock);
205 + list_insert_head(lst, pw);
206 + }
207 + return (pw);
208 +}
188 209
189 - lst = &((sigfd_proc_state_t *)p->p_sigfd)->sigfd_list;
190 - for (wlp = list_head(lst); wlp != NULL; wlp = list_next(lst, wlp)) {
191 - if (wlp->sigfd_wl_state == state) {
192 - signalfd_wake_rm(lst, wlp);
210 +static sigfd_poll_waiter_t *
211 +signalfd_wake_list_rm(sigfd_proc_state_t *pstate, signalfd_state_t *state)
212 +{
213 + list_t *lst = &pstate->sigfd_list;
214 + sigfd_poll_waiter_t *pw;
215 +
216 + for (pw = list_head(lst); pw != NULL; pw = list_next(lst, pw)) {
217 + if (pw->spw_state == state) {
193 218 break;
194 219 }
195 220 }
196 221
197 - if (list_is_empty(lst)) {
198 - ((sigfd_proc_state_t *)p->p_sigfd)->sigfd_pollwake_cb = NULL;
199 - list_destroy(lst);
200 - kmem_free(p->p_sigfd, sizeof (sigfd_proc_state_t));
201 - p->p_sigfd = NULL;
222 + if (pw != NULL) {
223 + list_remove(lst, pw);
224 + pw->spw_state = NULL;
225 + signalfd_state_release(state, B_FALSE);
202 226 }
227 +
228 + return (pw);
203 229 }
204 230
205 231 static void
206 232 signalfd_wake_list_cleanup(proc_t *p)
207 233 {
208 - sigfd_wake_list_t *wlp;
234 + sigfd_proc_state_t *pstate = p->p_sigfd;
235 + sigfd_poll_waiter_t *pw;
209 236 list_t *lst;
210 237
211 238 ASSERT(MUTEX_HELD(&p->p_lock));
239 + ASSERT(pstate != NULL);
212 240
213 - ((sigfd_proc_state_t *)p->p_sigfd)->sigfd_pollwake_cb = NULL;
241 + lst = &pstate->sigfd_list;
242 + while ((pw = list_remove_head(lst)) != NULL) {
243 + signalfd_state_t *state = pw->spw_state;
214 244
215 - lst = &((sigfd_proc_state_t *)p->p_sigfd)->sigfd_list;
216 - while (!list_is_empty(lst)) {
217 - wlp = (sigfd_wake_list_t *)list_remove_head(lst);
218 - kmem_free(wlp, sizeof (sigfd_wake_list_t));
245 + pw->spw_state = NULL;
246 + signalfd_state_release(state, B_FALSE);
247 +
248 + pollwakeup(&pw->spw_pollhd, POLLERR);
249 + pollhead_clean(&pw->spw_pollhd);
250 + kmem_free(pw, sizeof (*pw));
219 251 }
252 + list_destroy(lst);
253 +
254 + p->p_sigfd = NULL;
255 + kmem_free(pstate, sizeof (*pstate));
220 256 }
221 257
222 258 static void
223 259 signalfd_exit_helper(void)
224 260 {
225 261 proc_t *p = curproc;
226 - list_t *lst;
227 262
228 - /* This being non-null is the only way we can get here */
229 - ASSERT(p->p_sigfd != NULL);
230 -
231 263 mutex_enter(&p->p_lock);
232 - lst = &((sigfd_proc_state_t *)p->p_sigfd)->sigfd_list;
233 -
234 264 signalfd_wake_list_cleanup(p);
235 - list_destroy(lst);
236 - kmem_free(p->p_sigfd, sizeof (sigfd_proc_state_t));
237 - p->p_sigfd = NULL;
238 265 mutex_exit(&p->p_lock);
239 266 }
240 267
241 268 /*
269 + * Perform pollwake for a sigfd_poll_waiter_t entry.
270 + * Thanks to the strict and conflicting lock orders required for signalfd_poll
271 + * (pc_lock before p_lock) and signalfd_pollwake_cb (p_lock before pc_lock),
272 + * this is relegated to a taskq to avoid deadlock.
273 + */
274 +static void
275 +signalfd_wake_task(void *arg)
276 +{
277 + sigfd_poll_waiter_t *pw = arg;
278 + signalfd_state_t *state = pw->spw_state;
279 +
280 + pw->spw_state = NULL;
281 + signalfd_state_release(state, B_FALSE);
282 + pollwakeup(&pw->spw_pollhd, pw->spw_pollev);
283 + pollhead_clean(&pw->spw_pollhd);
284 + kmem_free(pw, sizeof (*pw));
285 +}
286 +
287 +/*
242 288 * Called every time a signal is delivered to the process so that we can
243 289 * see if any signal stream needs a pollwakeup. We maintain a list of
244 290 * signal state elements so that we don't have to look at every file descriptor
245 291 * on the process. If necessary, a further optimization would be to maintain a
246 292 * signal set mask that is a union of all of the sets in the list so that
247 293 * we don't even traverse the list if the signal is not in one of the elements.
248 294 * However, since the list is likely to be very short, this is not currently
249 295 * being done. A more complex data structure might also be used, but it is
250 296 * unclear what that would be since each signal set needs to be checked for a
251 297 * match.
252 298 */
253 299 static void
254 300 signalfd_pollwake_cb(void *arg0, int sig)
255 301 {
256 302 proc_t *p = (proc_t *)arg0;
303 + sigfd_proc_state_t *pstate = (sigfd_proc_state_t *)p->p_sigfd;
257 304 list_t *lst;
258 - sigfd_wake_list_t *wlp;
305 + sigfd_poll_waiter_t *pw;
259 306
260 307 ASSERT(MUTEX_HELD(&p->p_lock));
308 + ASSERT(pstate != NULL);
261 309
262 - if (p->p_sigfd == NULL)
263 - return;
310 + lst = &pstate->sigfd_list;
311 + pw = list_head(lst);
312 + while (pw != NULL) {
313 + signalfd_state_t *state = pw->spw_state;
314 + sigfd_poll_waiter_t *next;
264 315
265 - lst = &((sigfd_proc_state_t *)p->p_sigfd)->sigfd_list;
266 - wlp = list_head(lst);
267 - while (wlp != NULL) {
268 - signalfd_state_t *state = wlp->sigfd_wl_state;
269 -
270 316 mutex_enter(&state->sfd_lock);
271 -
272 - if (sigismember(&state->sfd_set, sig) &&
273 - state->sfd_pollhd.ph_list != NULL) {
274 - sigfd_wake_list_t *tmp = wlp;
275 -
276 - /* remove it from the list */
277 - wlp = list_next(lst, wlp);
278 - signalfd_wake_rm(lst, tmp);
279 -
280 - mutex_exit(&state->sfd_lock);
281 - pollwakeup(&state->sfd_pollhd, POLLRDNORM | POLLIN);
317 + if (!state->sfd_valid) {
318 + pw->spw_pollev = POLLERR;
319 + } else if (sigismember(&state->sfd_set, sig)) {
320 + pw->spw_pollev = POLLRDNORM | POLLIN;
282 321 } else {
283 322 mutex_exit(&state->sfd_lock);
284 - wlp = list_next(lst, wlp);
323 + pw = list_next(lst, pw);
324 + continue;
285 325 }
326 + mutex_exit(&state->sfd_lock);
327 +
328 + /*
329 + * Pull the sigfd_poll_waiter_t out of the list and dispatch it
330 + * to perform a pollwake. This cannot be done synchronously
331 + * since signalfd_poll and signalfd_pollwake_cb have
332 + * conflicting lock orders which can deadlock.
333 + */
334 + next = list_next(lst, pw);
335 + list_remove(lst, pw);
336 + taskq_dispatch_ent(signalfd_wakeq, signalfd_wake_task, pw, 0,
337 + &pw->spw_taskent);
338 + pw = next;
286 339 }
287 340 }
288 341
289 342 _NOTE(ARGSUSED(1))
290 343 static int
291 344 signalfd_open(dev_t *devp, int flag, int otyp, cred_t *cred_p)
292 345 {
293 - signalfd_state_t *state;
346 + signalfd_state_t *state, **sstate;
294 347 major_t major = getemajor(*devp);
295 348 minor_t minor = getminor(*devp);
296 349
297 350 if (minor != SIGNALFDMNRN_SIGNALFD)
298 351 return (ENXIO);
299 352
300 353 mutex_enter(&signalfd_lock);
301 354
302 355 minor = (minor_t)id_allocff(signalfd_minor);
303 -
304 356 if (ddi_soft_state_zalloc(signalfd_softstate, minor) != DDI_SUCCESS) {
305 357 id_free(signalfd_minor, minor);
306 358 mutex_exit(&signalfd_lock);
307 359 return (ENODEV);
308 360 }
309 361
310 - state = ddi_get_soft_state(signalfd_softstate, minor);
362 + state = kmem_zalloc(sizeof (*state), KM_SLEEP);
363 + state->sfd_valid = B_TRUE;
364 + state->sfd_count = 1;
365 + list_insert_head(&signalfd_state, (void *)state);
366 +
367 + sstate = ddi_get_soft_state(signalfd_softstate, minor);
368 + *sstate = state;
311 369 *devp = makedevice(major, minor);
312 370
313 - state->sfd_next = signalfd_state;
314 - signalfd_state = state;
315 -
316 371 mutex_exit(&signalfd_lock);
317 372
318 373 return (0);
319 374 }
320 375
321 376 /*
322 377 * Consume one signal from our set in a manner similar to sigtimedwait().
323 378 * The block parameter is used to control whether we wait for a signal or
324 379 * return immediately if no signal is pending. We use the thread's t_sigwait
325 380 * member in the same way that it is used by sigtimedwait.
326 381 *
327 382 * Return 0 if we successfully consumed a signal or an errno if not.
328 383 */
329 384 static int
330 385 consume_signal(k_sigset_t set, uio_t *uio, boolean_t block)
331 386 {
332 387 k_sigset_t oldmask;
333 388 kthread_t *t = curthread;
334 389 klwp_t *lwp = ttolwp(t);
335 390 proc_t *p = ttoproc(t);
336 391 timespec_t now;
337 392 timespec_t *rqtp = NULL; /* null means blocking */
338 393 int timecheck = 0;
339 394 int ret = 0;
340 395 k_siginfo_t info, *infop;
341 396 signalfd_siginfo_t ssi, *ssp = &ssi;
342 397
343 398 if (block == B_FALSE) {
344 399 timecheck = timechanged;
345 400 gethrestime(&now);
346 401 rqtp = &now; /* non-blocking check for pending signals */
347 402 }
348 403
349 404 t->t_sigwait = set;
350 405
351 406 mutex_enter(&p->p_lock);
352 407 /*
353 408 * set the thread's signal mask to unmask those signals in the
354 409 * specified set.
355 410 */
356 411 schedctl_finish_sigblock(t);
357 412 oldmask = t->t_hold;
358 413 sigdiffset(&t->t_hold, &t->t_sigwait);
359 414
360 415 /*
361 416 * Based on rqtp, wait indefinitely until we take a signal in our set
362 417 * or return immediately if there are no signals pending from our set.
363 418 */
364 419 while ((ret = cv_waituntil_sig(&t->t_delay_cv, &p->p_lock, rqtp,
365 420 timecheck)) > 0)
366 421 continue;
367 422
368 423 /* Restore thread's signal mask to its previous value. */
369 424 t->t_hold = oldmask;
370 425 t->t_sig_check = 1; /* so post_syscall sees new t_hold mask */
371 426
372 427 if (ret == -1) {
373 428 /* no signals pending */
374 429 mutex_exit(&p->p_lock);
375 430 sigemptyset(&t->t_sigwait);
376 431 return (EAGAIN); /* no signals pending */
377 432 }
378 433
379 434 /* Don't bother with signal if it is not in request set. */
380 435 if (lwp->lwp_cursig == 0 ||
381 436 !sigismember(&t->t_sigwait, lwp->lwp_cursig)) {
382 437 mutex_exit(&p->p_lock);
383 438 /*
384 439 * lwp_cursig is zero if pokelwps() awakened cv_wait_sig().
385 440 * This happens if some other thread in this process called
386 441 * forkall() or exit().
387 442 */
388 443 sigemptyset(&t->t_sigwait);
389 444 return (EINTR);
390 445 }
391 446
392 447 if (lwp->lwp_curinfo) {
393 448 infop = &lwp->lwp_curinfo->sq_info;
394 449 } else {
395 450 infop = &info;
396 451 bzero(infop, sizeof (info));
397 452 infop->si_signo = lwp->lwp_cursig;
|
↓ open down ↓ |
72 lines elided |
↑ open up ↑ |
398 453 infop->si_code = SI_NOINFO;
399 454 }
400 455
401 456 lwp->lwp_ru.nsignals++;
402 457
403 458 DTRACE_PROC2(signal__clear, int, ret, ksiginfo_t *, infop);
404 459 lwp->lwp_cursig = 0;
405 460 lwp->lwp_extsig = 0;
406 461 mutex_exit(&p->p_lock);
407 462
463 + if (PROC_IS_BRANDED(p) && BROP(p)->b_sigfd_translate)
464 + BROP(p)->b_sigfd_translate(infop);
465 +
408 466 /* Convert k_siginfo into external, datamodel independent, struct. */
409 467 bzero(ssp, sizeof (*ssp));
410 468 ssp->ssi_signo = infop->si_signo;
411 469 ssp->ssi_errno = infop->si_errno;
412 470 ssp->ssi_code = infop->si_code;
413 471 ssp->ssi_pid = infop->si_pid;
414 472 ssp->ssi_uid = infop->si_uid;
415 473 ssp->ssi_fd = infop->si_fd;
416 474 ssp->ssi_band = infop->si_band;
417 475 ssp->ssi_trapno = infop->si_trapno;
418 476 ssp->ssi_status = infop->si_status;
419 477 ssp->ssi_utime = infop->si_utime;
420 478 ssp->ssi_stime = infop->si_stime;
421 479 ssp->ssi_addr = (uint64_t)(intptr_t)infop->si_addr;
422 480
423 481 ret = uiomove(ssp, sizeof (*ssp), UIO_READ, uio);
424 482
425 483 if (lwp->lwp_curinfo) {
426 484 siginfofree(lwp->lwp_curinfo);
427 485 lwp->lwp_curinfo = NULL;
428 486 }
429 487 sigemptyset(&t->t_sigwait);
430 488 return (ret);
431 489 }
|
↓ open down ↓ |
14 lines elided |
↑ open up ↑ |
432 490
433 491 /*
434 492 * This is similar to sigtimedwait. Based on the fd mode we may wait until a
435 493 * signal within our specified set is posted. We consume as many available
436 494 * signals within our set as we can.
437 495 */
438 496 _NOTE(ARGSUSED(2))
439 497 static int
440 498 signalfd_read(dev_t dev, uio_t *uio, cred_t *cr)
441 499 {
442 - signalfd_state_t *state;
500 + signalfd_state_t *state, **sstate;
443 501 minor_t minor = getminor(dev);
444 502 boolean_t block = B_TRUE;
445 503 k_sigset_t set;
446 504 boolean_t got_one = B_FALSE;
447 505 int res;
448 506
449 507 if (uio->uio_resid < sizeof (signalfd_siginfo_t))
450 508 return (EINVAL);
451 509
452 - state = ddi_get_soft_state(signalfd_softstate, minor);
510 + sstate = ddi_get_soft_state(signalfd_softstate, minor);
511 + state = *sstate;
453 512
454 513 if (uio->uio_fmode & (FNDELAY|FNONBLOCK))
455 514 block = B_FALSE;
456 515
457 516 mutex_enter(&state->sfd_lock);
458 517 set = state->sfd_set;
459 518 mutex_exit(&state->sfd_lock);
460 519
461 520 if (sigisempty(&set))
462 521 return (set_errno(EINVAL));
463 522
464 523 do {
465 - res = consume_signal(state->sfd_set, uio, block);
466 - if (res == 0)
524 + res = consume_signal(set, uio, block);
525 +
526 + if (res == 0) {
527 + /*
528 + * After consuming one signal, do not block while
529 + * trying to consume more.
530 + */
467 531 got_one = B_TRUE;
532 + block = B_FALSE;
468 533
469 - /*
470 - * After consuming one signal we won't block trying to consume
471 - * further signals.
472 - */
473 - block = B_FALSE;
534 + /*
535 + * Refresh the matching signal set in case it was
536 + * updated during the wait.
537 + */
538 + mutex_enter(&state->sfd_lock);
539 + set = state->sfd_set;
540 + mutex_exit(&state->sfd_lock);
541 + if (sigisempty(&set))
542 + break;
543 + }
474 544 } while (res == 0 && uio->uio_resid >= sizeof (signalfd_siginfo_t));
475 545
476 546 if (got_one)
477 547 res = 0;
478 548
479 549 return (res);
480 550 }
481 551
482 552 /*
483 553 * If ksigset_t's were a single word, we would do:
484 554 * return (((p->p_sig | t->t_sig) & set) & fillset);
485 555 */
486 556 static int
487 557 signalfd_sig_pending(proc_t *p, kthread_t *t, k_sigset_t set)
488 558 {
489 559 return (((p->p_sig.__sigbits[0] | t->t_sig.__sigbits[0]) &
490 560 set.__sigbits[0]) |
491 561 ((p->p_sig.__sigbits[1] | t->t_sig.__sigbits[1]) &
|
↓ open down ↓ |
8 lines elided |
↑ open up ↑ |
492 562 set.__sigbits[1]) |
493 563 (((p->p_sig.__sigbits[2] | t->t_sig.__sigbits[2]) &
494 564 set.__sigbits[2]) & FILLSET2));
495 565 }
496 566
497 567 _NOTE(ARGSUSED(4))
498 568 static int
499 569 signalfd_poll(dev_t dev, short events, int anyyet, short *reventsp,
500 570 struct pollhead **phpp)
501 571 {
502 - signalfd_state_t *state;
572 + signalfd_state_t *state, **sstate;
503 573 minor_t minor = getminor(dev);
504 574 kthread_t *t = curthread;
505 575 proc_t *p = ttoproc(t);
506 576 short revents = 0;
507 577
508 - state = ddi_get_soft_state(signalfd_softstate, minor);
578 + sstate = ddi_get_soft_state(signalfd_softstate, minor);
579 + state = *sstate;
509 580
510 581 mutex_enter(&state->sfd_lock);
511 582
512 583 if (signalfd_sig_pending(p, t, state->sfd_set) != 0)
513 584 revents |= POLLRDNORM | POLLIN;
514 585
515 586 mutex_exit(&state->sfd_lock);
516 587
517 588 if (!(*reventsp = revents & events) && !anyyet) {
518 - *phpp = &state->sfd_pollhd;
589 + sigfd_proc_state_t *pstate;
590 + sigfd_poll_waiter_t *pw;
519 591
520 592 /*
521 593 * Enable pollwakeup handling.
522 594 */
523 - if (p->p_sigfd == NULL) {
524 - sigfd_proc_state_t *pstate;
595 + mutex_enter(&p->p_lock);
596 + if ((pstate = (sigfd_proc_state_t *)p->p_sigfd) == NULL) {
525 597
526 - pstate = kmem_zalloc(sizeof (sigfd_proc_state_t),
527 - KM_SLEEP);
598 + mutex_exit(&p->p_lock);
599 + pstate = kmem_zalloc(sizeof (*pstate), KM_SLEEP);
528 600 list_create(&pstate->sigfd_list,
529 - sizeof (sigfd_wake_list_t),
530 - offsetof(sigfd_wake_list_t, sigfd_wl_lst));
601 + sizeof (sigfd_poll_waiter_t),
602 + offsetof(sigfd_poll_waiter_t, spw_list));
603 + pstate->sigfd_pollwake_cb = signalfd_pollwake_cb;
531 604
605 + /* Check again, after blocking for the alloc. */
532 606 mutex_enter(&p->p_lock);
533 - /* check again now that we're locked */
534 607 if (p->p_sigfd == NULL) {
535 608 p->p_sigfd = pstate;
536 609 } else {
537 610 /* someone beat us to it */
538 611 list_destroy(&pstate->sigfd_list);
539 - kmem_free(pstate, sizeof (sigfd_proc_state_t));
612 + kmem_free(pstate, sizeof (*pstate));
613 + pstate = p->p_sigfd;
540 614 }
541 - mutex_exit(&p->p_lock);
542 615 }
543 616
544 - mutex_enter(&p->p_lock);
545 - if (((sigfd_proc_state_t *)p->p_sigfd)->sigfd_pollwake_cb ==
546 - NULL) {
547 - ((sigfd_proc_state_t *)p->p_sigfd)->sigfd_pollwake_cb =
548 - signalfd_pollwake_cb;
549 - }
550 - signalfd_wake_list_add(state);
617 + pw = signalfd_wake_list_add(pstate, state);
618 + *phpp = &pw->spw_pollhd;
551 619 mutex_exit(&p->p_lock);
552 620 }
553 621
554 622 return (0);
555 623 }
556 624
557 625 _NOTE(ARGSUSED(4))
558 626 static int
559 627 signalfd_ioctl(dev_t dev, int cmd, intptr_t arg, int md, cred_t *cr, int *rv)
560 628 {
561 - signalfd_state_t *state;
629 + signalfd_state_t *state, **sstate;
562 630 minor_t minor = getminor(dev);
563 631 sigset_t mask;
564 632
565 - state = ddi_get_soft_state(signalfd_softstate, minor);
633 + sstate = ddi_get_soft_state(signalfd_softstate, minor);
634 + state = *sstate;
566 635
567 636 switch (cmd) {
568 637 case SIGNALFDIOC_MASK:
569 638 if (ddi_copyin((caddr_t)arg, (caddr_t)&mask, sizeof (sigset_t),
570 639 md) != 0)
571 640 return (set_errno(EFAULT));
572 641
573 642 mutex_enter(&state->sfd_lock);
574 643 sigutok(&mask, &state->sfd_set);
575 644 mutex_exit(&state->sfd_lock);
576 645
577 646 return (0);
578 647
579 648 default:
|
↓ open down ↓ |
4 lines elided |
↑ open up ↑ |
580 649 break;
581 650 }
582 651
583 652 return (ENOTTY);
584 653 }
585 654
586 655 _NOTE(ARGSUSED(1))
587 656 static int
588 657 signalfd_close(dev_t dev, int flag, int otyp, cred_t *cred_p)
589 658 {
590 - signalfd_state_t *state, **sp;
659 + signalfd_state_t *state, **sstate;
660 + sigfd_poll_waiter_t *pw = NULL;
591 661 minor_t minor = getminor(dev);
592 662 proc_t *p = curproc;
593 663
594 - state = ddi_get_soft_state(signalfd_softstate, minor);
664 + sstate = ddi_get_soft_state(signalfd_softstate, minor);
665 + state = *sstate;
595 666
596 - if (state->sfd_pollhd.ph_list != NULL) {
597 - pollwakeup(&state->sfd_pollhd, POLLERR);
598 - pollhead_clean(&state->sfd_pollhd);
599 - }
600 -
601 - /* Make sure our state is removed from our proc's pollwake list. */
667 + /* Make sure state is removed from this proc's pollwake list. */
602 668 mutex_enter(&p->p_lock);
603 - signalfd_wake_list_rm(p, state);
669 + if (p->p_sigfd != NULL) {
670 + sigfd_proc_state_t *pstate = p->p_sigfd;
671 +
672 + pw = signalfd_wake_list_rm(pstate, state);
673 + if (list_is_empty(&pstate->sigfd_list)) {
674 + signalfd_wake_list_cleanup(p);
675 + }
676 + }
604 677 mutex_exit(&p->p_lock);
605 678
679 + if (pw != NULL) {
680 + pollwakeup(&pw->spw_pollhd, POLLERR);
681 + pollhead_clean(&pw->spw_pollhd);
682 + kmem_free(pw, sizeof (*pw));
683 + }
684 +
606 685 mutex_enter(&signalfd_lock);
607 686
608 - /* Remove our state from our global list. */
609 - for (sp = &signalfd_state; *sp != state; sp = &((*sp)->sfd_next))
610 - VERIFY(*sp != NULL);
611 -
612 - *sp = (*sp)->sfd_next;
613 -
687 + *sstate = NULL;
614 688 ddi_soft_state_free(signalfd_softstate, minor);
615 689 id_free(signalfd_minor, minor);
616 690
691 + signalfd_state_release(state, B_TRUE);
692 +
617 693 mutex_exit(&signalfd_lock);
618 694
619 695 return (0);
620 696 }
621 697
622 698 static int
623 699 signalfd_attach(dev_info_t *devi, ddi_attach_cmd_t cmd)
624 700 {
625 701 if (cmd != DDI_ATTACH || signalfd_devi != NULL)
626 702 return (DDI_FAILURE);
627 703
|
↓ open down ↓ |
1 lines elided |
↑ open up ↑ |
628 704 mutex_enter(&signalfd_lock);
629 705
630 706 signalfd_minor = id_space_create("signalfd_minor", 1, L_MAXMIN32 + 1);
631 707 if (signalfd_minor == NULL) {
632 708 cmn_err(CE_WARN, "signalfd couldn't create id space");
633 709 mutex_exit(&signalfd_lock);
634 710 return (DDI_FAILURE);
635 711 }
636 712
637 713 if (ddi_soft_state_init(&signalfd_softstate,
638 - sizeof (signalfd_state_t), 0) != 0) {
714 + sizeof (signalfd_state_t *), 0) != 0) {
639 715 cmn_err(CE_WARN, "signalfd failed to create soft state");
640 716 id_space_destroy(signalfd_minor);
641 717 mutex_exit(&signalfd_lock);
642 718 return (DDI_FAILURE);
643 719 }
644 720
645 721 if (ddi_create_minor_node(devi, "signalfd", S_IFCHR,
646 722 SIGNALFDMNRN_SIGNALFD, DDI_PSEUDO, NULL) == DDI_FAILURE) {
647 723 cmn_err(CE_NOTE, "/dev/signalfd couldn't create minor node");
648 724 ddi_soft_state_fini(&signalfd_softstate);
649 725 id_space_destroy(signalfd_minor);
650 726 mutex_exit(&signalfd_lock);
651 727 return (DDI_FAILURE);
652 728 }
653 729
654 730 ddi_report_dev(devi);
655 731 signalfd_devi = devi;
656 732
657 733 sigfd_exit_helper = signalfd_exit_helper;
658 734
735 + list_create(&signalfd_state, sizeof (signalfd_state_t),
736 + offsetof(signalfd_state_t, sfd_list));
737 +
738 + signalfd_wakeq = taskq_create("signalfd_wake", 1, minclsyspri,
739 + 0, INT_MAX, TASKQ_PREPOPULATE);
740 +
659 741 mutex_exit(&signalfd_lock);
660 742
661 743 return (DDI_SUCCESS);
662 744 }
663 745
664 746 _NOTE(ARGSUSED(0))
665 747 static int
666 748 signalfd_detach(dev_info_t *dip, ddi_detach_cmd_t cmd)
667 749 {
668 750 switch (cmd) {
669 751 case DDI_DETACH:
670 752 break;
671 753
672 754 default:
673 755 return (DDI_FAILURE);
674 756 }
675 757
676 - /* list should be empty */
677 - VERIFY(signalfd_state == NULL);
678 -
679 758 mutex_enter(&signalfd_lock);
759 +
760 + if (!list_is_empty(&signalfd_state)) {
761 + /*
762 + * There are dangling poll waiters holding signalfd_state_t
763 + * entries on the global list. Detach is not possible until
764 + * they purge themselves.
765 + */
766 + mutex_exit(&signalfd_lock);
767 + return (DDI_FAILURE);
768 + }
769 + list_destroy(&signalfd_state);
770 +
771 + /*
772 + * With no remaining entries in the signalfd_state list, the wake taskq
773 + * should be empty with no possibility for new entries.
774 + */
775 + taskq_destroy(signalfd_wakeq);
776 +
680 777 id_space_destroy(signalfd_minor);
681 778
682 779 ddi_remove_minor_node(signalfd_devi, NULL);
683 780 signalfd_devi = NULL;
684 781 sigfd_exit_helper = NULL;
685 782
686 783 ddi_soft_state_fini(&signalfd_softstate);
687 784 mutex_exit(&signalfd_lock);
688 785
689 786 return (DDI_SUCCESS);
690 787 }
691 788
692 789 _NOTE(ARGSUSED(0))
693 790 static int
694 791 signalfd_info(dev_info_t *dip, ddi_info_cmd_t infocmd, void *arg, void **result)
695 792 {
696 793 int error;
697 794
698 795 switch (infocmd) {
699 796 case DDI_INFO_DEVT2DEVINFO:
700 797 *result = (void *)signalfd_devi;
701 798 error = DDI_SUCCESS;
702 799 break;
703 800 case DDI_INFO_DEVT2INSTANCE:
704 801 *result = (void *)0;
705 802 error = DDI_SUCCESS;
706 803 break;
707 804 default:
708 805 error = DDI_FAILURE;
709 806 }
710 807 return (error);
711 808 }
712 809
713 810 static struct cb_ops signalfd_cb_ops = {
714 811 signalfd_open, /* open */
715 812 signalfd_close, /* close */
716 813 nulldev, /* strategy */
717 814 nulldev, /* print */
718 815 nodev, /* dump */
719 816 signalfd_read, /* read */
720 817 nodev, /* write */
721 818 signalfd_ioctl, /* ioctl */
722 819 nodev, /* devmap */
723 820 nodev, /* mmap */
724 821 nodev, /* segmap */
725 822 signalfd_poll, /* poll */
726 823 ddi_prop_op, /* cb_prop_op */
727 824 0, /* streamtab */
728 825 D_NEW | D_MP /* Driver compatibility flag */
729 826 };
730 827
731 828 static struct dev_ops signalfd_ops = {
732 829 DEVO_REV, /* devo_rev */
733 830 0, /* refcnt */
734 831 signalfd_info, /* get_dev_info */
735 832 nulldev, /* identify */
736 833 nulldev, /* probe */
737 834 signalfd_attach, /* attach */
738 835 signalfd_detach, /* detach */
739 836 nodev, /* reset */
740 837 &signalfd_cb_ops, /* driver operations */
741 838 NULL, /* bus operations */
742 839 nodev, /* dev power */
743 840 ddi_quiesce_not_needed, /* quiesce */
744 841 };
745 842
746 843 static struct modldrv modldrv = {
747 844 &mod_driverops, /* module type (this is a pseudo driver) */
748 845 "signalfd support", /* name of module */
749 846 &signalfd_ops, /* driver ops */
750 847 };
751 848
752 849 static struct modlinkage modlinkage = {
753 850 MODREV_1,
754 851 (void *)&modldrv,
755 852 NULL
756 853 };
757 854
758 855 int
759 856 _init(void)
760 857 {
761 858 return (mod_install(&modlinkage));
762 859 }
763 860
764 861 int
765 862 _info(struct modinfo *modinfop)
766 863 {
767 864 return (mod_info(&modlinkage, modinfop));
768 865 }
769 866
770 867 int
771 868 _fini(void)
772 869 {
773 870 return (mod_remove(&modlinkage));
774 871 }
|
↓ open down ↓ |
85 lines elided |
↑ open up ↑ |
XXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXXX