NOTE: I've updated source pointer to point at
the illumos source tree now. Some of
the text in here is out of date, because it was from 2005.
An OpenSolaris IPsec Hello
Hi! Those of you visiting here
probably know I'm one of the IPsec guys (actually, I'm the original IPsec
guy) here in
Solaris-land.
Bill may also
have some stuff to say about the IPsec source in Solaris.
The
kernel source for IPsec
(
AH,
ESP,
and the
internal databases)
lives
in
usr/src/uts/common/inet/ip/,
because we're an integral part of our IP implementation. I should warn you
now that there's a mixture of STREAMS boundaries and function calls between
the different parts of the IPsec subsystem. It used to be almost all
STREAMS, because of
broken US Export
restrictions (across
all political party lines, BTW). We figured
we could sell it as exportable to
the
powers that be more easily if we
used a "general-purpose interface" which allowed for easy module perforation
for moving our data around. As the restrictions loosened up, we were able to
streamline things somewhat. We hope to do even more now that OpenSolaris is
available. There are bits and pieces of the actual Solaris IPsec missing
from OpenSolaris (especially from ESP) that will show up on OpenSolaris soon
as well, now that we're officially open-source. (It's a bit of
a
chicken
& egg problem.)
This entry will be discussing
the
PF_KEY implementation
in Solaris. I assume you know something about how IPsec works, have read RFC
2367, and have a handle on TCP/IP protocol suite principles.
A
Brief PF_KEY Synopsis
PF_KEY is analagous to the PF_ROUTE routing
socket. See
Keith
Sklower's Radix-Tree paper at his site for the introduction to routing
sockets. Where the routing socket manipulates IP forwarding entries (or
routes), the PF_KEY socket manipulates IPsec Security Associations (SAs). A
user-space application sends a message to the kernel telling it to ADD,
DELETE, or UPDATE SAs, and the kernel sends back a message indicating either
success or failure.
The paper makes mention of a message that's
little-used in most PF_ROUTE implementations -- RTM_RESOLVE. RTM_RESOLVE
allows a user-space application to resolve an address, e.g. a user-space ARP.
This inspired PF_KEY's similar message, SADB_ACQUIRE, which is used to
tell a user-space key management (KM) daemon that an outgoing IPsec SA is
needed.
RFC 2367 has the
specification for a PF_KEY socket.
Solaris Changes from RFC
2367
Most, if not all, existing PF_KEY implementations either alter
or add to the message types in RFC 2367. Most changes were made
because:
- RFC2367 does not mesh well in to some KM
protocols (esp. IKEv1).
- The UNIX errno space is not sufficient
to describe some failures.
- Some implementors thought PF_KEY
would be a suitable place to put their IPsec Security Policy Database (SPD)
manipulations.
Solaris addresses the last bullet
by introducing a
separate
PF_POLICY
socket for SPD manipulation. The other issues, however, were a problem for
us.
All of our changes to PF_KEY were a direct result of
implementing
The Internet Key
Exchange (IKE) as part of our work in Solaris 9. They are summarized
below:
- Extended ACQUIRE - Instead of sending up a
message for every IPsec SA that is not present, send up a list of what is
needed to protect the packet to the listening KM daemon. This allows a
packet that requires AH and ESP to express that protection in one ACQUIRE
message.
- Extended REGISTER - Goes hand-in-hand with
Extended ACQUIRE. You tell the kernel that you can handle the Extended
ACQUIRE.
- Inverse ACQUIRE - The closest to policy
manipulation we come in PF_KEY... it's a one-time consultation of the IPsec
SPD, and you get as an answer an Extended ACQUIRE, just like if a outbound
packet was triggering it. This is useful for IKE responders, and for
diagnostic listeners on the PF_KEY socket.
- Diagnostic
codes - EINVAL is a frequently occurring value for sadb_msg_errno. Was
it a weak DES key? Was it a botched sockaddr structure? The
reserved field in struct sadb_msg now contains useful extra data
when an EINVAL occurs.
- typedefs - It's easier to
type sadb_ext_t instead of struct
sadb_ext.
- 64-bit alignment - RFC 2367 claims all
PF_KEY structures can be aligned on 64-bit boundaries. In Solaris, we force
it to happen. That's
why net/pfkeyv2.h
has a lot of unions in most of its structure
definitions.
But Dan... weren't you an author of RFC
2367?!?
Yes I was. Hence the question:
Dude, Where's My
Spec?I wasn't
allowed (yes, I'm serious; and no, it
had nothing to do with any government interference) to work on IPsec or IKE
when I first got to Sun, but the RFC was work that was a continuation from my
previous job. In hindsight, I think we should've been paying more attention
to the customers (authors of KM daemons, of which I'd be one someday). I was
wrapped up in non-IPsec work at Sun when I wasn't working on what would
become RFC 2367, and I split my attention in a non-optimal
fashion.
Enough yapping, let's see some code!
The first
place to look
is
usr/src/uts/common/net/pfkeyv2.h,
which gets deposited into
/usr/include/net/ on a running
system. You'll notice every structure that doesn't have a field of type
uint64_t will have a union in it. Here's the base PF_KEY
message:
Notice that every extra field that is not
in RFC 2367 uses the _X_ naming convention. In the case above, we took the
two uint32_ts and merged them into a union with a uint64_t so that we can
force 64-bit alignment on sadb_msg_t. This makes PF_KEY message
manipulations 64-bit happy.
If you look at an extension that has a
64-bit type in it already, you'll see that there's no alignment-forcing union
inserted into the definition:
(Yes, PF_KEY is Y2038-ready, as long as the KM application is compiled as
64-bit.)
For people
interested in applications that use PF_KEY, I'd suggest investigating
the
ipseckey(1m) command. Its source can be found
in
usr/src/cmd/cmd-inet/usr.sbin/ipseckey.c.
This program does not
use
command-line editing yet, but as an example of a PF_KEY consumer, it does
the job.
All relevant PF_KEY internal-implementation headers live
in
usr/src/uts/common/inet. Relevant
header files
are:
- ipsec_info.h
-
The structures for M_CTL messages prepended to data that gets passed around
between IPsec STREAMS modules. The keysock consumer interface definitions
are important here. Note the keysock_in_t,
especially ks_in_extv[]. This vector allows easy access to all
PF_KEY extension headers. (As we remove STREAMS from IPsec, this file
will shrink. If all goes well, all that will remain (in some form) are
IPSEC_IN and IPSEC_OUT messages, and that's because you can't enforce policy
without some form of packet
tagging.)
- keysock.h
-
The keysock driver implements the PF_KEY socket interface at its
most basic. A keysock_t represents an open PF_KEY socket, and
a keysock_consumer_t represents a consumer of PF_KEY messages
(i.e. AH and ESP).
All relevant PF_KEY
internal-implementation source lives one level down from the headers, in
the
ip/
directory. They
are:
- keysockddi.c
-
DDI and module loading
glue.
- keysock.c
-
STREAMS driver that implements the PF_KEY socket as /dev/keysock. It
is also a STREAMS module that sits atop AH and ESP listening for their
messages.
- sadb.c
-
Where IPsec's Security Association Database (SADB) is mostly
implemented. You'll notice
an ip_sadb.c
file, because we want the fast-path lookups to be in IP without going through
modstubs.
Most of this entry will be spent
in
keysock.c. Other portions of the subsystem will either be
visited by one of Team IPsec when we have cycles, or if there are enough
requests.
keysock either handles messages from a PF_KEY
socket, or from the SADB and its consumers. The heavy lifting for
user-generated PF_KEY messages is
in
keysock_parse().
The
first
thing
keysock_parse()
does is perform some reality checks. First off, the actual data length
should match what's in the
sadb_msg_len field. (Note the first of
may SADB_nnTOmm() macros, for converting units of 64-bits to units of 8-bits,
etc.) The first really interesting part of reality-checking is the
"extension vector", or as it's called in the IPsec code, the
extv. A
PF_KEY message has a base header, followed by one or more extension headers.
Let me quote this section from RFC 2367:
The
keysock
code takes this to heart
in
keysock_get_ext(). The
extv
is a vector of
sadb_ext_t pointers, where the specific extension
type (SADB_EXT_foo) can be found by merely indexing into the vector by that
value. Say you want the
sadb_sa_t extension:
The above code snippet shows what you
need to do. As we generate the
extv, if we see a collision, we
return EINVAL (with an appropriate diagnostic). We do not enforce extension
ordering inbound or outbound. Once keysock is done with first-pass
reality-checks, the extv is sent around (as part of the KEYSOCK_IN M_CTL that
is prepended to the data) to all who need it.
Most messages are
shuttled off
to
keysock_passdown()
for sending off via STREAMS to either AH or ESP. Unusual inbound
messages
for
keysock are the SADB_REGISTER, SADB_FLUSH,
SADB_DUMP, SADB_ACQUIRE, and SADB_X_INVERSE_ACQUIRE
ones.
SADB_REGISTER sets socket state, as well as informs
consumers about the register. If an SADB_REGISTER is sent for a specific SA
type (e.g. ESP, AH), then the message is treated like a common-case message,
except that when its reply arrive,
keysock_t state is altered to
indicate a registered socket. If the
sadb_msg_satype is set to 0,
then the message is an EXTENDED register, and an extended-register extension
(
sadb_x_ereg_t) is required.
keysock converts the
0-terminated list of one-byte values into a bit vector internally. Then it
sends the message to consumers like a normal REGISTER.
An inbound
SADB_ACQUIRE can be used to signal other KM applications. (If PF_KEY is used
to keep keys in the kernel for user-space consumers.) The more common case,
however, is a
negative ACQUIRE, which means a KM negotiation failed
and the internal ACQUIRE record (more on this in a bit) needs to be
cancelled.
SADB_FLUSH and SADB_DUMP messages need to lock down
the
keysock module until their respective operations are finished.
FLUSH doesn't take as long, but DUMP needs to keep track of all
consumer-originated replies until the consumer indicates it is
done.
SADB_X_PROMISC merely changes some keysock state. It never
goes to a consumer.
The SADB_X_INVERSE_ACQUIRE handling is a
glimpse of things to come for keysock. It does not use
the
keysock_passdown()
method of calling a consumer. It instead calls directly into IPsec (and if
we had other in-kernel consumers, it would directly call to those) and
returns a message to the user
immediately.
The
keysock_rput()
function handles all messages from consumers. The KEYSOCK_OUT portion of the
switch checks for FLUSH and DUMP messages, and releases the clamps on keysock
if the final message for a FLUSH or a DUMP has been received. Otherwise,
the
keysock_passup()
does the
work.
keysock_passup()
is conceptually much simpler
than
keysock_parse(). It
merely has to make one of three delivery decisions:
- Do I
deliver to all PF_KEY sockets?
- Do I deliver to
all registered PF_KEY sockets?
- Do I deliver merely to
the sender?
The more interesting parts of PF_KEY
are handled inside the SADB code
(
sadb.c)
and in the consumers. Those will be the subject of one or more other
entries, because of all of the interaction with the IPsec
SADB.
This entry was brought to you by the Technorati
Tags
OpenSolaris
and
Solaris.