Kebe Says - Dan McDonald's Blog

Home Data Center 3.0 -- Part 2: HDC's many uses

In the prior post, I mentioned a need for four active ethernet ports. These four ports are physical links to four distinct Ethernet networks. Joyent's SmartOS and Triton characterize these with NIC Tags. I just view them as distinct networks. They are all driven by the illumos igb(7d) driver (hmm, that man page needs updating) on HDC 3.0, and I'll specify them now:

  • igb0 - My home network.
  • igb1 - The external network. This port is directly attached to my FiOS Optical Network Terminal's Gigabit Ethernet port.
  • igb2 - My work network. Used for my workstation, and "external" NIC Tag for my work-at-home Triton deployment, Kebecloud.
  • igb3 - Mostly unused for now, but connected to Kebecloud's "admin" NIC Tag.
The zones abstraction in illumos allows not just containment, but a full TCP/IP stack to be assigned to each zone. This makes a zone feel more like a proper virtual machine in most cases. Many illumos distros are able to run a full VMM as the only process in a zone, which ends up delivering a proper virtual machine. As of this post's publication, however, I'm only running illumos zones, not full VM ones. Here's their list:
(0)# zoneadm list -cv
  ID NAME             STATUS     PATH                           BRAND    IP    
   0 global           running    /                              ipkg     shared
   1 webserver        running    /zones/webserver               lipkg    excl  
   2 work             running    /zones/work                    lipkg    excl  
   3 router           running    /zones/router                  lipkg    excl  
   4 calendar         running    /zones/calendar                lipkg    excl  
   5 dns              running    /zones/dns                     lipkg    excl  
(0)# 
Their zone names correspond to their jobs:
  • global - The illumos global zone is what exists even in the absence of other zones. Some illumos distros, like SmartOS, encourage minimizing what a global zone has for services. HDC's global zone serves NFS and SMB/CIFS to my home network. The global zone has the primary link into the home network. HDC's global zone has no default route, so if any operations that need out-of-the-house networking either go through another zone (e.g. DNS lookups), or a defaut route must be temporarily added (e.g. NTP chimes, `pkg update`).
  • webserver - Just like the name says, this zone hosts the web server for kebe.com. For this zone, it uses lofs(7FS), the loopback virtual file system to inherit subdirectories from the global zone. I edit blog entries (like this one) for this zone via NFS from my laptop. The global zone serves NFS, but the files I'm editing are not only available in the global zone, but are also lofs-mounted into the webserver zone as well. The webserver zone has a vnic (see here for details about a vnic, the virtual network interface controller) link to the home network, but has a default route, and the router zone's NAT (more later) forwards ports 80 and 443 to this zone. Additionally, the home network DHCP server lives here, for no other reason than, "it's not the global zone."
  • work - The work zone is new in the past six years, and as of recently, eschews lofs(7FS) for delegated ZFS datasets. A delegated ZFS dataset, a proper filesystem in this case, is assigned entirely to the zone. This zone also has the primary (and only) link to the work network, a physical connection (for now unused) to my work Triton's admin network, and an etherstub vnic (see here for details about an etherstub) link to the router zone. The work zone itself is a router for work network machines (as well as serves DNS for the work network), but since I only have one public IP address, I use the etherstub to link it to the router zone. The zone, as of recent illumos builds, can further serve its own NFS. This allows even less global-zone participation with work data, and it means work machines do not need backchannel paths to the global zone for NFS service. The work zone has a full illumos development environment on it, and performs builds of illumos rather quickly. It also has its own Unbound (see the DNS zone below) for the work network.
  • router - The router zone does what the name says. It has a vnic link to the home network and the physical link to the external network. It runs ipnat to NAT etherstub work traffic or home network traffic to the Internet, and redirects well-known ports to their respective zones. It does not use a proper firewall, but has IPsec policy in place to drop anything that isn't matched by ipnat, because in a no-policy situation, ipnat lets unmatched packets arrive on the local zone. The router zone also runs the (alas still closed source) IKEv1 daemon to allow me remote access to this server while I'm remote. It uses an old test tool from the pre-Oracle Sun days a few of you half-dozen readers will know by name. We have a larval IKEv2 out in the community, and I'll gladly switch to that once it's available.
  • calendar - Blogged about when first deployed, this zone's sole purpose is to serve our calendar both internally and externally. It uses the Radicale server. Many of my complaints from the prior post have been alleviated by subsequent updates. I wish the authors understood interface stability a bit better (jumping from 2.x to 3.0 was far more annoying than it needed to be), but it gets the job done. It has a vnic link to the home network, a default route, and gets calendaring packets shuffled to it by the router zone so my family can access the calendar wherever we are.
  • dns - A recent switch to OmniOSce-supported NSD and Unbound encouraged me to bring up a dedicated zone for DNS. I run both daemons here, and have the router zone redirect public kebe.com requests here to NSD. The Unbound server services all networks that can reach HDC. It has a vnic link to the home network, and a default route.

The first picture shows HDC as a single entity, and its physical networks. The second picture shows the zones of HDC as Virtual Network Machines, which should give some insight into why I call my home server a Home Data Center.

HDC, physically HDC, logically

I Have No Whistle to Blow, But I Must Scream

I'm sure all twelve of you readers out there know what's been going on with respect to recent revelations about NSA activity. Among other things is the unnerving discovery that NSA has been attempting to actively dumb-down security for the Internet.

In the second linked article, Bruce Schneier calls upon people to blow the whistle on, "how the NSA and other agencies are subverting routers, switches, the internet backbone, encryption technologies and cloud systems." Here's the deal:

I have never been asked to introduce back-doors or weaken security in the Solaris, OpenSolaris, Oracle Solaris 11 (for the four months I worked on it post-barn-door-closing), or Illumos. If there are weaknesses there, it was not because of any deliberate effort on my part.

You can view the kernel IPsec protocol sources (AH & ESP) here, by looking at ipsec*.c, sadb.c, spd.c, spdsock.c, keysock.c and header files in the directory above it. You can see the IPsec management utilities here. According to at least one well-known security researcher, the Illumos (nee OpenSolaris) IPsec code isn't bollocks.

There is no open-source for IKE, because the libike.so.1 library was mostly OEM code, from a vendor whose technical lead let me co-write an RFC with him. You can use the various observability and debugging tools in Illumos to see how things work, however, if you wish.

If you want to write your own, better, key management application for Illumos (or even Oracle Solaris), you can use PF_KEY to control the IPsec SADB. I detail the subsequent additions to RFC 2367 on my day-one-of-OpenSolaris blog post. If you want to work on IPsec in totally-open-source Illumos, you have my blessing, and I'll definitely be reviewing (and maybe integrating if you pass code reviews) your code.

MAC-then-encrypt - also harmful, also hard to do in Solaris

Hello again!

Kenny Paterson's once again turning the theoretical into practical. This time he's pointed out that if one configures IPsec to MAC-then-encrypt (do packet authentication first, THEN encrypt the packet), one is open to cryptographic attack. Here's a citation for his ACM CCS paper.

The good news is that we cannot configure the IPsec SPD to perform MAC-then-encrypt at all. One could configure transport mode to just MAC, then have the packet transit a tunnel that just encrypts, but then you'll see warnings about the encryption-only tunnel configuration. This has been true for a LONG time (starting with S9, maybe even S8).

So basically, we don't make it easy for you to shoot yourself in the foot this way. You really have to try, and as I pointed out earlier, the encryption-only part will warn you.

IKEv2 project page updated

The IKEv2 project page on opensolaris.org now has links to both an early-revision design document, and a webrev pointer.

OpenSolaris works out of the box with Amazon Virtual Private Cloud

Glenn Brunette asked me if OpenSolaris could access the Amazon Virtual Private Cloud or not. I told him it had better, or else there was a bug. He then did some scripting work, got some BGP help from Sowmini, and consulted Sebastien on some tunneling details. It's now up, running, and in a nice package, ready to use.

IKEv2 project now on OpenSolaris

The IKEv2 project page is now available here on OpenSolaris. There's mailing-list information and a brief hello. We are working on design-level issues right now and some larval code, so c'mon over as we start to fire this up.

New IPsec goodies in S10u7

Hello again. Pardon any latency. This whole Oracle thing has been a bit distracting. Never mind figuring out the hard way what limitations there are on racoon2 and what to do about them.

Anyway, Solaris 10 Update 7 (aka. 5/09) is now out. It contains a few new IPsec features that have been in OpenSolaris for a bit. They include:
  • HMAC-SHA-2 support per RFC 4868 in all three sizes (SHA-256, SHA-384, and SHA-512) for IPsec and IKE.
  • 2048-bit (group 14), 3072-bit (group 15), and 4096-bit (group 16) Diffie-Hellman groups for IKE. (NOTE: Be careful running 3072 or 4096 bit on Niagara 1 hardware, see here for why. Niagara 2 works better, but not optimally, with those two groups.
  • IKE Dead Peer Detection
  • SMF Management of IPsec. Four new services split out from network/initial:
    • svc:/network/ipsec/ipsecalgs:default -- Sets up IPsec kernel algorithm mappings.
    • svc:/network/ipsec/policy:default -- Sets up the IPsec SPD (reads /etc/inet/ipsecinit.conf).
    • svc:/network/ipsec/manual-key:default -- Reads any manually-added SAs (reads /etc/inet/secret/ipseckeys).
    • svc:/network/ipsec/ike:default -- Controls the IKE daemon.
  • The UDP_NAT_T_ENDPOINT socket option from OpenSolaris, so you can develop your own NAT-Traversing IPsec key management apps without relying on in.iked.
We've even more goodies in OpenSolaris, BTW.

How to tell when a performance project succeeds?

The Volo project is an effort to improve the interface between sockets and any socket-driven subsystems, including the TCP/IP stack. During their testing, they panicked during some IPsec tests. See this bug for what they reported.

In our IPsec, we have LARVAL IPsec security associations (SAs). These are SAs that reserve a unique 32-bit Security Parameters Index (SPI), but have no other data. If a packet arrives for a LARVAL SA, we queue it up, so that when it gets filled in by key management, the packet can go through. We do this because of the IKE Quick Mode Exchange, which looks like this:

        INITIATOR                               RESPONDER
        ---------                               ---------

        IKE Quick Mode Packet #1  ------->

                                <----------     IKE Quick Mode Packet #2

        IKE Quick Mode Packet #3  -------->
Now once the initiator receives Quick Mode packet #2, it has enough information to transmit an IPsec-protected packet. Unfortunately, the responder cannot finish completing its Security Association entries until it receives packet #3. It is possible, then, that the initiator's IPsec packet may arrive before the responder has finished processing IKE. Let's look at the packets again:
        INITIATOR                               RESPONDER
        ---------                               ---------

        IKE Quick Mode Packet #1  ------->

                                <----------     IKE Quick Mode Packet #2

        ESP or AH packet        ---------->     Does this packet...

        IKE Quick Mode Packet #3  -------->     ... get processed after my
                                                receipt of #3, also after
                                                which I SADB_UPDATE my
                                                inbound SA, which changes it
                                                from LARVAL to MATURE?

Now the code that queues up an inbound IPsec packet for a LARVAL SA is sadb_set_lpkt(), as was shown in the bug's description. It turns out there was a locking bug in this function - and we even had an ASSERT()-ion that the SA manipulated by sadb_set_lpkt() was always larval. The problem was, we discounted the possibility of IKE finishing between the detection of a LARVAL SA and the actual call to sadb_set_lpkt().

The Volo project improved UDP latency enough so that the IKE packet wormed its way up the stack and into in.iked faster than the concurrent ESP or AH packet. The aformentioned ASSERT() tripped during Volo testing, because we did not check the SA's state while holding its lock. Had we, we could tell that the LARVAL SA was promoted to ACTIVE, and we could go ahead and process the packet.

This race condition was present since sadb_set_lpkt() was introduced in Solaris 9, but it took Volo's improved performance to find it. So hats off to Volo for speeding things up enough to find long-dormant race conditions!



Addendum - IKEv2 does not have this problem because its equivalent to v1's Quick Mode is a simpler request/response exchange, so the responder is ready to receive when it sends the response back to the initiator.

Racoon2 on OpenSolaris - first tiny steps

NOTE: A version of this was sent to the racoon2-users alias also.

I've been spending some of my time bringing up racoon2 (an IKEv2 and IKEv1 daemon) on OpenSolaris.

Because of vast differences in PF_KEY implementations between OpenSolaris and other OS kernels, I've spent my racoon2 time actually getting IKEv1 to work first, instead of IKEv2. Right now, what's working is:
  • IKEv1 initiates and derives IPsec SAs for single-algorithm IPsec policies.
That's it! IKEv1 responder needs work, as does all of IKEv2, as does work
for multiple-choice of algorithms. But there's enough change in there to say
something now.

ARCHITECTURAL DIFFERENCES

The most noteworthy change in the OpenSolaris work so far is that literally there's no spmd (a separate IPsec SPD daemon racoon2 uses) required for now. This is because:
  • We don't have the indirection between ACQUIREs and the appropriate policy entry. Our extended ACQUIREs contain everything needed to construct a proposal. There's no SPD consultation required with an OpenSolaris ACQUIRE.
  • Our responder-side logic uses inverse-ACQUIRE, which will provide the same structure as ACQUIRE w.r.t. proposal construction. This is the closest we get to needing something like spmd, and given its syntactic equality to an extended ACQUIRE, we can use it on rekeying if the responder initiates the next time.
If spmd serves another purpose, we will revisit it. As it stands, however, I cannot see us using it.

CODE DIFFERENCES

In OpenSolaris, we use the "webrev" tool to generate easy-to-review web pages with diffs of all varieties. The webrev for what I have so far in racoon2 is available at:
http://cr.opensolaris.org/~danmcd/racoon2-opensolaris/
Feel free to make comments or suggestions about what I've done.

IPsec Tunnel Reform, IP Instances, and other new-in-S10 goodies

Solaris 10 Update 4 (or as marketing calls it, Solaris 10 08/07) contains some backported goodies we've had in Nevada/OpenSolaris for a while.

IPsec Tunnel Reform was one of the first big pieces of code to be dropped into the S10u4 codebase. It shores up our interoperability story, so you can now start constructing VPNs that tell IKE to negotiation Tunnel-Mode (as opposed to IP-in-IP transport mode). Tunnels themselves are still network interfaces, but their IPsec configuration is now wholly in the purview of ipsecconf(1M). Modulo IKE (which we still OEM part of), we developed Tunnel Reform in the open with OpenSolaris.

Also new for S10u4 is IP Instances. Before u4, you could create non-global zones, but their network management (e.g. ifconfig(1M)) had to be done from the global zone. With u4, one can create a unique instance zone which gives the zone its own complete TCP/IP stack. The global zone needs to only assign a GLDv3-compatible interface to the zone (e.g. bge, nge, e1000g) to give it a unique IP Instance. You could have a single box be your router/firewall/NAT, your web server, and who knows what else, all while keeping those functions out of the fully-privileged global zone. It makes me think about upgrading to business-class Internet service at home, building my own box like Bart did and getting a few extra Ethernet ports.

Oh, and if you want to do it all with less ethernet ports, check out OpenSolaris's Crossbow and its VNIC abstraction!

Have fun moving your network bits in new and interesting ways!

Dan's blog is powered by blahgd