Kebe Says - Dan McD's Blog» IPsec

Detangling IPsec NAT-Traversal, and a more stable API

Filed under:

— Dan McD. @ September 4, 2007 10:28

As of OpenSolaris build 73, the way we do IPsec NAT-Traversal changes for the cleaner.

Before this build, IPsec NAT-Traversal was performed by pushing a STREAMS module on top of an open UDP socket. This module (nattymod) would either strip UDP headers out of ESP-in-UDP packets, or strip the "0-SPI" marker (four bytes of zeroes) before passing the datagram up to the application.

This method worked, but it had some flaws, including the implicit setting of certain socket options (UDP_INCLHDR) that would then potentially be blocked from applications that actually required them. Also, nattymod did not perform the insertion of the 0-SPI automatically, the application was stuck doing that on its own. And while FireEngine merged TCP in to IP for S10, we needed to wait for one of the earlier builds of OpenSolaris to get the UDP equivalent.

With the new NAT-Traversal scheme, a key management application (like our closed-source in.iked(1M)) that wishes to aid in NAT-Traversal simply sets a new socket option: UDP_NAT_T_ENDPOINT. If this option is set, the following things happen:

On inbound packets, the first four bytes after the UDP header are inspected.

If there are less than four byte, the packet is dropped, and assumed to be a NAT-T keepalive.
If the four-byte are all zeros (i.e. the 0-SPI), they are stripped and regular UDP processing occurs.
Otherwise, the UDP header is stripped and the packet is shuffled off the IPsec's ESP for processing.

On outbound packets

The 0-SPI is inserted between the UDP header and the user-generated data.
ESP will send ESP-in-UDP by itself depending on the Security Association's properties.

This will help anyone who wants to port their open-source IKE or other key management application to Solaris deal with the possibility of NAT boxes.

And on a related note, this will be mentioned during Nicolas Droux's OpenSolaris Networking for Developers talk next week at Sun Tech Days in Boston. I'll be there too, talking about S10 and OpenSolaris security features, as well as being in the audience for Nicolas's talk.

Not using "ncp" on Niagara considered harmful

Filed under:

— Dan McD. @ July 16, 2007 11:22

One of our IPsec remote-access servers here is on a Niagara-powered T2000 server. It's really overkill for the job, but we get to see how IKE and the Niagara crypto accelerator (known as "ncp" by its driver name) interact.

The nice thing about running your own stuff is you find things out before others do. Consider bug 6339802. We saw AWFUL IKE performance on Niagara boxes before we fixed this. Admittedly, IKE is single-threaded (for reasons beyond the scope of this blog entry), but it was taking seconds to complete an IKE Phase I with 2048-bit RSA and 1536-bit Diffie-Hellman.

More recently, we've been enabling bigger Diffie-Hellman MODP groups in our IKE. The Niagara driver has a limit of 2048-bit operations, so we limited the Phase I DH to 2048-bits.

Here's a DTrace script we like to use to measure responder-side Phase I times (in.iked and libike are closed-source, but trust me on this one):

#!/usr/sbin/dtrace -s

/*
 * Responder-side Phase I setup.
 */
pid$1::ssh_policy_new_connection:entry
{
        self->negstart[arg0] = timestamp;
        printf("Initial packet received, pm_info = %p", arg0);
}

pid$1::ssh_policy_negotiation_done_isakmp:entry
{
        /* Use 16384 value for "CONNECTED" from isakmp_doi.h */
        printf("return %d - %s ", arg1,
            (arg1 == 16384) ? "Success" : "Error case.");

        printf("pm_info %p finished, took %d ns", arg0,
            timestamp - self->negstart[arg0]);
}

With the fix for 6339802 in place, we can get pretty good phase 1 times....


dtrace: script '/space/responder-phase1.d' matched 4 probes
CPU     ID                    FUNCTION:NAME
  4  48040  ssh_policy_new_connection:entry Initial packet received, pm_info = bed6c8
  4  48042 ssh_policy_negotiation_done_isakmp:entry return 16384 - Success pm_info bed6c8 finished, took 165512764 ns

That's 165msec. Some of that time is packet round-trips, but let's ignore that now for this exercise.

Now let's do someting drastic: cryptoadm disable provider=ncp/0 all. Suddenly those seconds come back...


  4  48040  ssh_policy_new_connection:entry Initial packet received, pm_info = bed6c8
  4  48042 ssh_policy_negotiation_done_isakmp:entry return 16384 - Success pm_info bed6c8 finished, took 7732419300 ns

WOW! Like Darren said, that's blog-worthy, and that's why I'm here.

So why is Niagara so slow without its crypto accelerator?

That's a good question. Keep in mind that four big-number operations occur in an IKE Phase I exchange like I measured above: one RSA Signature, one RSA Verification, one Diffie-Hellman generate, and one Diffie-Hellman agree. I'm not 100% sure, but I believe the default software implementation of big-number operations on SPARC uses floating-point tricks to help out. Using floating-point on a Niagara kicks in a software emulation, which would definitely increase the time taken for each bignum operation.

So the moral of the story is to make sure you're exploiting all of the hardware that's available to you!

Tunnel Reform Code Review starts now.

Filed under:

— Dan McD. @ October 9, 2006 12:51

Hey everyone!

The IPsec Tunnel Reform project's code review is now underway. Take a look and see what it took to bring up IPsec Tunnel-Mode processing in a world where tunnels are not actions from a policy, but rather a first-class network interface (or at least after Clearview it will be).

Highlights for administrators include:

Augmentiations to ipsecconf(1m) to specify a tunnel interface's policy, whether it's S9-style IP-in-IP transport mode, or RFC 2401-compliant Tunnel Mode.

No changes to IKE configuration.

You can configure tunnel security without ifconfig(1m) using just ipsecconf(1m). We put all IPsec policy in ipsecconf(1m) and let ifconfig manage interfaces (and route(1m) manage routing).

Additions to ipseckey(1m) for manual tunnel-mode SA configuration, or monitoring of kernel interactions with Key Management.

Better interoperability with everyone else's Tunnel Mode IPsec.

Highlights for OpenSolaris-hackers include:

New per-tunnel policy structure: ipsec_tun_pol_t, which instantiates the existing policy-head per tunnel.

Getting rid of IRE_DB_REQ messages for SA addition/updates. This improves SA-adding performance and reduces the complexity of the ESP and AH modules.

New PF_KEY and PF_POLICY messages to reflect Tunnel Mode.

Shifting of tunnel IPsec policy enforcment from the lower-instance of IP to "tun" itself. (NOTE: This will change again when we merge with Clearview.)

Share your comments on tref-discuss, and let us know what you think!

This entry brought to you by the Technorati tags IPsec, Solaris, and OpenSolaris.

Tunnel Reform now open for your perusal

Filed under:

— Dan McD. @ July 13, 2006 16:22

NOTE: Links here point to docs that no longer exist. Maybe the the Internet Archive might have 'em?

IPsec in Solaris has one missing piece, and we're about to put it in place.

The IPsec Tunnel Reform project aims to give Solaris and OpenSolaris an RFC 2401-compliant tunnel-mode implementation.

There's a lot of changes in the source base, some of which aren't open sourced (IKE), but most of which are in existing OpenSolaris code. The project page has a webrev showing the changes thus far. We're trying to be more open in our development processes here in the Solaris group, and showing you Tunnel Reform before we've finished it, AND before we've started major test efforts, is Team IPsec's own way of contributing to this openness.

Think of the source snapshot as a "Code Preview" instead of a "Code Review". There's a newly-rewhacked design document there too, and we'd like you to look at it and discuss it on the OpenSolaris communities or the tref-discuss@opensolaris.org mailing list.

And once we're done with this, we can think about RFC 4301 (2401's replacement) and friends, more zones support, SMF-izing things, giving TX labelled SA support... :)

This entry brought to you by the Technorati tags IPsec, Solaris, and OpenSolaris.

ESP without authentication considered harmful

Filed under:

— Dan McD. @ March 7, 2006 11:17

Hopefully you will read this and go "That's obvious". I'm writing this entry, however, for those who don't.

When IPsec was being specified over 10 years ago, attacks against cipher-block-chaining (CBC) encryption were understood. ESP has an authentication algorithm because AH had a vocal-enough opposition to merit having packet integrity in ESP also (there are also performance arguments for ESP-auth).

Now there actual attacks with actual results. Kenny Paterson and Arnold Yau have published a paper with attacks against no-authentication ESP Tunnel Mode. I believe some of the techniques can also be employed against Transport Mode as well, but again, only with no authentication present.

The simple solution, of course, is to employ your choice of ESP Authentication (encr_auth_algs in ipsecconf(1m) or ifconfig(1m)) or AH (auth_algs in ipsecconf(1m) or ifconfig(1m)) with your IPsec deployment. We warn users about such configurations with ifconfig(1m) today. There is an RFE to eliminate or make very difficult encryption-only configurations in Solaris. Maybe someone in the OpenSolaris community would like to take a stab at it?

Put IPsec to work in YOUR application

Filed under:

— Dan McD. @ September 13, 2005 21:43

Hello coders!

Most people know that you can use ipsecconf(1m) to apply IPsec policy enforcement to an existing application. For example, if you wish to only allow inbound telnet traffic that's under IPsec protection, you'd put something like this into /etc/inet/ipsecinit.conf or other ipsecconf(1m) input:

# Inbound telnet traffic should be IPsec protected
{ lport 23 } ipsec { encr_algs any(128..) encr_auth_algs md5 sa shared}
    or ipsec { encr_algs any(128..) encr_auth_algs sha1 sa shared}

Combine that with appropriate IKE configuration or manual IPsec keys, and you can secure your telnet traffic against eavesdropping, connection hijacking, etc.

For existing services, using ipsecconf(1m) is the most expedient way to bring IPsec protection to bear on packets.

For new services, or services that are being modified anyway, consider using per-socket policy as an alternative. Some advantages to per-socket policy are:

Per-socket policy is stored internally in network session state (the conn_t structure in OpenSolaris). Entries from ipsecconf(1m) are stored in the global Security Policy Database (SPD). No global SPD entries means lower latency for fresh flow creation, and less lock acquisition.

Per-socket bypass means fewer bypass entries in global SPD. If I bypass remote-port 80 using ipsecconf(1m), I can, in theory, enter the system with a remote TCP packet with port=80. There's an RFE (6219908) to work around this, but per-socket is still quicker. I'd love a web proxy with the ability to set per-socket bypass.

The newly SMF-ized inetd(1m) would be a prime candidate for per-socket policy. See RFE 6226853, and this might be something someone in the OpenSolaris community would like to tackle!

Let's look at the ipsec_req_t structure that's been around since Solaris 8 in /usr/include/netinet/in.h:

/*
 * Different preferences that can be requested from IPSEC protocols.
 */

#define IP_SEC_OPT 0x22 /* Used to set IPSEC options */
#define IPSEC_PREF_NEVER 0x01
#define IPSEC_PREF_REQUIRED 0x02
#define IPSEC_PREF_UNIQUE 0x04
/*
 * This can be used with the setsockopt() call to set per socket security
 * options. When the application uses per-socket API, we will reflect
 * the request on both outbound and inbound packets.
 */

typedef struct ipsec_req {
	uint_t ipsr_ah_req; /* AH request */
	uint_t ipsr_esp_req; /* ESP request */
	uint_t ipsr_self_encap_req; /* Self-Encap request */
	uint8_t ipsr_auth_alg; /* Auth algs for AH */
	uint8_t ipsr_esp_alg; /* Encr algs for ESP */
	uint8_t ipsr_esp_auth_alg; /* Auth algs for ESP */
} ipsec_req_t;

The ipsec_req_t is a subset of what one can specify with ipsecconf(1m) in Solaris 9 or later, but it matched what one could do with Solaris 8's version. Algorithm values are derived from PF_KEY (see /usr/include/net/pfkeyv2.h for values), as below. One could also use getipsecalgbyname(3nsl). If I wish to set a socket to use ESP with AES and and MD5, I'd set it up as follows:

	int s; /* Socket file descriptor... */

	ipsec_req_t ipsr;

 .....

	/* NOTE: Do this BEFORE calling connect() or accept() for TCP sockets. */
	ipsr.ipsr_ah_req = 0;
	ipsr.ipsr_esp_req = IPSEC_PREF_REQUIRED;

	ipsr.ipsr_self_encap_req = 0;
	ipsr.ipsr_auth_alg = 0;

	ipsr.ipsr_esp_alg = SADB_EALG_AES;
	ipsr.ipsr_esp_auth_alg = SADB_AALG_MD5HMAC;
	if (setsockopt(s, IPPROTO_IP, IP_SEC_OPT, &ipsr,
	    sizeof (ipsr)) == -1) {
		perror("setsockopt");
		bail(); /* Ugggh, we failed. */
	}
	/* You now have per-socket policy set. */

Notice I mentioned setting the socket option BEFORE calling connect() or accept? This is because of a phenomenon we implement called connection latching. Basically, connection latching means that once an endpoint is connect()-ed, the IPsec policy (whether set per-socket or inherited from the state of the global SPD at the time) latches in place. We made this decision to avoid keeping policy-per-datagram state for things like TCP retransmits.

One thing per-socket policy does not address is the case of unconnected datagram services. In a perfect world, we could have IPsec policy information percolate all the way to the socket layer, where an application can make fully-informed per-datagram decisions on whether or not a particular packet was secured or not. It's a hard problem, requiring XNET sockets (to use sendmsg() and recvmsg() with ancillary data).

BTW, if you want to bypass whatever global entries are in the SPD, you can zero out the structure, and set all three (ah, esp, self_encap) action indicators to IPSEC_PREF_NEVER. You need to be privileged (root or "sys_net_config") to use per-socket bypass, however.

So modulo the keying problem (setting up IKE or having both ends agree on IPsec manual keys), you can put IPsec to work right in your application. In fact, if you use IKE, you can let IKE sort out permissions and access control (by using PKI-issued certificates, self-signed certificates, or preshared keys) and have policy merely determine the details of the protection required.

EDITED: This entry brought to you by the Technorati tags IPsec, Solaris, and OpenSolaris.

Sooner than I thought!

Filed under:

— Dan McD. @ August 31, 2005 08:31

Well, it looks like our friends on the OpenSolaris source team have been working more quickly than I thought. If you look at certain files (like the ESP source), they've been mysteriously fleshed-out. While crypto hasn't been sorted out, some other IPsec-specific files are now present in the tree!

Since crypto hasn't shown up yet, I guess we can't fully celebrate, but once it does, maybe Bill or I will have some more to say on the IPsec source.

COMING SOON: Full IPsec kernel code

Filed under:

— Dan McD. @ August 16, 2005 11:20

AWESOME... TOTALLY AWESOME!

Pardon my channelling of Jeff Spicoli, but Mike Kupfer is working on delivering the with-kernel-crypto build of OpenSolaris. This means that OpenSolaris users will have a working implementation of ESP at their fingertips. It also means Bill and I can (time permitting... uggh) give some more details and explanations about what's going on under the covers.

Watch this space for more!

PF_KEY in Solaris, or "Dude, Where's My Spec?"

Filed under:

— Dan McD. @ June 14, 2005 09:06

NOTE: I've updated source pointer to point at the illumos source tree now. Some of the text in here is out of date, because it was from 2005.

An OpenSolaris IPsec Hello

Hi! Those of you visiting here probably know I'm one of the IPsec guys (actually, I'm the original IPsec guy) here in Solaris-land. Bill may also have some stuff to say about the IPsec source in Solaris.

The kernel source for IPsec (AH, ESP, and the internal databases) lives in usr/src/uts/common/inet/ip/, because we're an integral part of our IP implementation. I should warn you now that there's a mixture of STREAMS boundaries and function calls between the different parts of the IPsec subsystem. It used to be almost all STREAMS, because of broken US Export restrictions (across all political party lines, BTW). We figured we could sell it as exportable to the powers that be more easily if we used a "general-purpose interface" which allowed for easy module perforation for moving our data around. As the restrictions loosened up, we were able to streamline things somewhat. We hope to do even more now that OpenSolaris is available. There are bits and pieces of the actual Solaris IPsec missing from OpenSolaris (especially from ESP) that will show up on OpenSolaris soon as well, now that we're officially open-source. (It's a bit of a chicken & egg problem.)

This entry will be discussing the PF_KEY implementation in Solaris. I assume you know something about how IPsec works, have read RFC 2367, and have a handle on TCP/IP protocol suite principles.

A Brief PF_KEY Synopsis

PF_KEY is analagous to the PF_ROUTE routing socket. See Keith Sklower's Radix-Tree paper at his site for the introduction to routing sockets. Where the routing socket manipulates IP forwarding entries (or routes), the PF_KEY socket manipulates IPsec Security Associations (SAs). A user-space application sends a message to the kernel telling it to ADD, DELETE, or UPDATE SAs, and the kernel sends back a message indicating either success or failure.

The paper makes mention of a message that's little-used in most PF_ROUTE implementations -- RTM_RESOLVE. RTM_RESOLVE allows a user-space application to resolve an address, e.g. a user-space ARP. This inspired PF_KEY's similar message, SADB_ACQUIRE, which is used to tell a user-space key management (KM) daemon that an outgoing IPsec SA is needed. RFC 2367 has the specification for a PF_KEY socket.

Solaris Changes from RFC 2367

Most, if not all, existing PF_KEY implementations either alter or add to the message types in RFC 2367. Most changes were made because:

RFC2367 does not mesh well in to some KM protocols (esp. IKEv1).

The UNIX errno space is not sufficient to describe some failures.

Some implementors thought PF_KEY would be a suitable place to put their IPsec Security Policy Database (SPD) manipulations.

Solaris addresses the last bullet by introducing a separate PF_POLICY socket for SPD manipulation. The other issues, however, were a problem for us.

All of our changes to PF_KEY were a direct result of implementing The Internet Key Exchange (IKE) as part of our work in Solaris 9. They are summarized below:

Extended ACQUIRE - Instead of sending up a message for every IPsec SA that is not present, send up a list of what is needed to protect the packet to the listening KM daemon. This allows a packet that requires AH and ESP to express that protection in one ACQUIRE message.

Extended REGISTER - Goes hand-in-hand with Extended ACQUIRE. You tell the kernel that you can handle the Extended ACQUIRE.

Inverse ACQUIRE - The closest to policy manipulation we come in PF_KEY... it's a one-time consultation of the IPsec SPD, and you get as an answer an Extended ACQUIRE, just like if a outbound packet was triggering it. This is useful for IKE responders, and for diagnostic listeners on the PF_KEY socket.

Diagnostic codes - EINVAL is a frequently occurring value for sadb_msg_errno. Was it a weak DES key? Was it a botched sockaddr structure? The reserved field in struct sadb_msg now contains useful extra data when an EINVAL occurs.

typedefs - It's easier to type sadb_ext_t instead of struct sadb_ext.

64-bit alignment - RFC 2367 claims all PF_KEY structures can be aligned on 64-bit boundaries. In Solaris, we force it to happen. That's why net/pfkeyv2.h has a lot of unions in most of its structure definitions.

But Dan... weren't you an author of RFC 2367?!?

Yes I was. Hence the question: Dude, Where's My Spec?

I wasn't allowed (yes, I'm serious; and no, it had nothing to do with any government interference) to work on IPsec or IKE when I first got to Sun, but the RFC was work that was a continuation from my previous job. In hindsight, I think we should've been paying more attention to the customers (authors of KM daemons, of which I'd be one someday). I was wrapped up in non-IPsec work at Sun when I wasn't working on what would become RFC 2367, and I split my attention in a non-optimal fashion.

Enough yapping, let's see some code!

The first place to look is usr/src/uts/common/net/pfkeyv2.h, which gets deposited into /usr/include/net/ on a running system. You'll notice every structure that doesn't have a field of type uint64_t will have a union in it. Here's the base PF_KEY message:

typedef struct sadb_msg {
	uint8_t sadb_msg_version; /* Version, currently PF_KEY_V2 */
	uint8_t sadb_msg_type; /* ADD, UPDATE, etc. */
	uint8_t sadb_msg_errno; /* Error number from UNIX errno space */
	uint8_t sadb_msg_satype; /* ESP, AH, etc. */
	uint16_t sadb_msg_len; /* Length in 64-bit words. */

	uint16_t sadb_msg_reserved; /* must be zero */
	/*
	 * Use the reserved field for extended diagnostic information on errno
	 * responses.
	*/
	#define	sadb_x_msg_diagnostic sadb_msg_reserved
	/* Union is for guaranteeing 64-bit alignment. */
	union {
		struct {
			uint32_t sadb_x_msg_useq; /* Set by originator */
			uint32_t sadb_x_msg_upid; /* Set by originator */
		} sadb_x_msg_actual;
		uint64_t sadb_x_msg_alignment;
	} sadb_x_msg_u;
#define sadb_msg_seq sadb_x_msg_u.sadb_x_msg_actual.sadb_x_msg_useq
#define sadb_msg_pid sadb_x_msg_u.sadb_x_msg_actual.sadb_x_msg_upid
} sadb_msg_t;

Notice that every extra field that is not in RFC 2367 uses the _X_ naming convention. In the case above, we took the two uint32_ts and merged them into a union with a uint64_t so that we can force 64-bit alignment on sadb_msg_t. This makes PF_KEY message manipulations 64-bit happy.

If you look at an extension that has a 64-bit type in it already, you'll see that there's no alignment-forcing union inserted into the definition:

typedef struct sadb_lifetime {
	uint16_t sadb_lifetime_len;
	uint16_t sadb_lifetime_exttype; /* SOFT, HARD, CURRENT */
	uint32_t sadb_lifetime_allocations;
	uint64_t sadb_lifetime_bytes;
	uint64_t sadb_lifetime_addtime; /* These fields are assumed to hold */
	uint64_t sadb_lifetime_usetime; /* >= sizeof (time_t). */
} sadb_lifetime_t;

(Yes, PF_KEY is Y2038-ready, as long as the KM application is compiled as 64-bit.)

For people interested in applications that use PF_KEY, I'd suggest investigating the ipseckey(1m) command. Its source can be found in usr/src/cmd/cmd-inet/usr.sbin/ipseckey.c. This program does not use command-line editing yet, but as an example of a PF_KEY consumer, it does the job.

All relevant PF_KEY internal-implementation headers live in usr/src/uts/common/inet. Relevant header files are:

ipsec_info.h
- The structures for M_CTL messages prepended to data that gets passed around between IPsec STREAMS modules. The keysock consumer interface definitions are important here. Note the keysock_in_t, especially ks_in_extv[]. This vector allows easy access to all PF_KEY extension headers. (As we remove STREAMS from IPsec, this file will shrink. If all goes well, all that will remain (in some form) are IPSEC_IN and IPSEC_OUT messages, and that's because you can't enforce policy without some form of packet tagging.)

keysock.h
- The keysock driver implements the PF_KEY socket interface at its most basic. A keysock_t represents an open PF_KEY socket, and a keysock_consumer_t represents a consumer of PF_KEY messages (i.e. AH and ESP).

All relevant PF_KEY internal-implementation source lives one level down from the headers, in the ip/ directory. They are:

keysockddi.c
- DDI and module loading glue.

keysock.c
- STREAMS driver that implements the PF_KEY socket as /dev/keysock. It is also a STREAMS module that sits atop AH and ESP listening for their messages.

sadb.c
- Where IPsec's Security Association Database (SADB) is mostly implemented. You'll notice an ip_sadb.c file, because we want the fast-path lookups to be in IP without going through modstubs.

Most of this entry will be spent in keysock.c. Other portions of the subsystem will either be visited by one of Team IPsec when we have cycles, or if there are enough requests.

keysock either handles messages from a PF_KEY socket, or from the SADB and its consumers. The heavy lifting for user-generated PF_KEY messages is in keysock_parse().

The first thing keysock_parse() does is perform some reality checks. First off, the actual data length should match what's in the sadb_msg_len field. (Note the first of may SADB_nnTOmm() macros, for converting units of 64-bits to units of 8-bits, etc.) The first really interesting part of reality-checking is the "extension vector", or as it's called in the IPsec code, the extv. A PF_KEY message has a base header, followed by one or more extension headers. Let me quote this section from RFC 2367:

There MUST be only one instance of a extension type in a message. (e.g.
Base, Key, Lifetime, Key is forbidden). An EINVAL will be returned if there
are duplicate extensions within a message.

The keysock code takes this to heart in keysock_get_ext(). The extv is a vector of sadb_ext_t pointers, where the specific extension type (SADB_EXT_foo) can be found by merely indexing into the vector by that value. Say you want the sadb_sa_t extension:

sadb_sa_t *sa;

sa = (sadb_sa_t *)extv[SADB_EXT_SA];

The above code snippet shows what you need to do. As we generate the extv, if we see a collision, we return EINVAL (with an appropriate diagnostic). We do not enforce extension ordering inbound or outbound. Once keysock is done with first-pass reality-checks, the extv is sent around (as part of the KEYSOCK_IN M_CTL that is prepended to the data) to all who need it.

Most messages are shuttled off to keysock_passdown() for sending off via STREAMS to either AH or ESP. Unusual inbound messages
for keysock are the SADB_REGISTER, SADB_FLUSH, SADB_DUMP, SADB_ACQUIRE, and SADB_X_INVERSE_ACQUIRE ones.

SADB_REGISTER sets socket state, as well as informs consumers about the register. If an SADB_REGISTER is sent for a specific SA type (e.g. ESP, AH), then the message is treated like a common-case message, except that when its reply arrive, keysock_t state is altered to indicate a registered socket. If the sadb_msg_satype is set to 0, then the message is an EXTENDED register, and an extended-register extension (sadb_x_ereg_t) is required. keysock converts the 0-terminated list of one-byte values into a bit vector internally. Then it sends the message to consumers like a normal REGISTER.

An inbound SADB_ACQUIRE can be used to signal other KM applications. (If PF_KEY is used to keep keys in the kernel for user-space consumers.) The more common case, however, is a negative ACQUIRE, which means a KM negotiation failed and the internal ACQUIRE record (more on this in a bit) needs to be cancelled.

SADB_FLUSH and SADB_DUMP messages need to lock down the keysock module until their respective operations are finished. FLUSH doesn't take as long, but DUMP needs to keep track of all consumer-originated replies until the consumer indicates it is done.

SADB_X_PROMISC merely changes some keysock state. It never goes to a consumer.

The SADB_X_INVERSE_ACQUIRE handling is a glimpse of things to come for keysock. It does not use the keysock_passdown() method of calling a consumer. It instead calls directly into IPsec (and if we had other in-kernel consumers, it would directly call to those) and returns a message to the user immediately.

The keysock_rput() function handles all messages from consumers. The KEYSOCK_OUT portion of the switch checks for FLUSH and DUMP messages, and releases the clamps on keysock if the final message for a FLUSH or a DUMP has been received. Otherwise, the keysock_passup() does the work.

keysock_passup() is conceptually much simpler than
keysock_parse(). It merely has to make one of three delivery decisions:

Do I deliver to all PF_KEY sockets?

Do I deliver to all registered PF_KEY sockets?

Do I deliver merely to the sender?

The more interesting parts of PF_KEY are handled inside the SADB code (sadb.c) and in the consumers. Those will be the subject of one or more other entries, because of all of the interaction with the IPsec SADB.

This entry was brought to you by the Technorati Tags OpenSolaris and Solaris.