Kebe Says - Dan McDonald's Blog

More ZFS Love - Rapid Recovery

I recently scragged my laptop's primary root partition such that I needed to install-from-scratch again. I had a bootable secondary root, but since it was running an experimental BFUed build, this partition could not be upgraded.
Let's quickly look at how I configure my 100-decimal-GB laptop disk (&*%$% disk vendors):
  • c0d0s0 --> Primary root, approx 8GB (and I mean GB the way software geeks mean it, 8 * 1024^3).
  • c0d0s1 --> Secondary root, same size.
  • c0d0s3 --> swap, 3GB, same as main memory size (useful for system dumps).
  • c0d0s7 --> ZFS pool "tank", with 5 ZFS filesystems (tank, CSW, spro, local, and danmcd).
Before I shut it down for upgrade, I simply uttered this: zpool export tank That's it!

Then I plugged in my laptop to a local netinstall network, and PXE-booted to a Nevada build 73 install (which includes detangled NAT-Traversal) and started it up. I used the old Solaris installer because I know how to tell it to preserve disk slices. I told it to preserver the secondary root and the zpool.

One install later, I get root, and to recover my miscellaneous backups, CSW software, compilers, local binaires, and home directory, I just did: zpool import tankAnd again, that's it! All of my filesystems got mounted properly, no tables to edit, NOTHING.
I'll be at the University of Michigan Engineering Career Fair this coming Tuesday, and will be wandering campus on Monday. If you're one of the four people who read this blog and are there, drop by the Sun table - and see the very laptop I'm talking about. :)

IPsec Tunnel Reform, IP Instances, and other new-in-S10 goodies

Solaris 10 Update 4 (or as marketing calls it, Solaris 10 08/07) contains some backported goodies we've had in Nevada/OpenSolaris for a while.

IPsec Tunnel Reform was one of the first big pieces of code to be dropped into the S10u4 codebase. It shores up our interoperability story, so you can now start constructing VPNs that tell IKE to negotiation Tunnel-Mode (as opposed to IP-in-IP transport mode). Tunnels themselves are still network interfaces, but their IPsec configuration is now wholly in the purview of ipsecconf(1M). Modulo IKE (which we still OEM part of), we developed Tunnel Reform in the open with OpenSolaris.

Also new for S10u4 is IP Instances. Before u4, you could create non-global zones, but their network management (e.g. ifconfig(1M)) had to be done from the global zone. With u4, one can create a unique instance zone which gives the zone its own complete TCP/IP stack. The global zone needs to only assign a GLDv3-compatible interface to the zone (e.g. bge, nge, e1000g) to give it a unique IP Instance. You could have a single box be your router/firewall/NAT, your web server, and who knows what else, all while keeping those functions out of the fully-privileged global zone. It makes me think about upgrading to business-class Internet service at home, building my own box like Bart did and getting a few extra Ethernet ports.

Oh, and if you want to do it all with less ethernet ports, check out OpenSolaris's Crossbow and its VNIC abstraction!

Have fun moving your network bits in new and interesting ways!

Detangling IPsec NAT-Traversal, and a more stable API

As of OpenSolaris build 73, the way we do IPsec NAT-Traversal changes for the cleaner.

Before this build, IPsec NAT-Traversal was performed by pushing a STREAMS module on top of an open UDP socket. This module (nattymod) would either strip UDP headers out of ESP-in-UDP packets, or strip the "0-SPI" marker (four bytes of zeroes) before passing the datagram up to the application.

This method worked, but it had some flaws, including the implicit setting of certain socket options (UDP_INCLHDR) that would then potentially be blocked from applications that actually required them. Also, nattymod did not perform the insertion of the 0-SPI automatically, the application was stuck doing that on its own. And while FireEngine merged TCP in to IP for S10, we needed to wait for one of the earlier builds of OpenSolaris to get the UDP equivalent.

With the new NAT-Traversal scheme, a key management application (like our closed-source in.iked(1M)) that wishes to aid in NAT-Traversal simply sets a new socket option: UDP_NAT_T_ENDPOINT. If this option is set, the following things happen:
  • On inbound packets, the first four bytes after the UDP header are inspected.
    • If there are less than four byte, the packet is dropped, and assumed to be a NAT-T keepalive.
    • If the four-byte are all zeros (i.e. the 0-SPI), they are stripped and regular UDP processing occurs.
    • Otherwise, the UDP header is stripped and the packet is shuffled off the IPsec's ESP for processing.
  • On outbound packets
    • The 0-SPI is inserted between the UDP header and the user-generated data.
    • ESP will send ESP-in-UDP by itself depending on the Security Association's properties.
This will help anyone who wants to port their open-source IKE or other key management application to Solaris deal with the possibility of NAT boxes.
And on a related note, this will be mentioned during Nicolas Droux's OpenSolaris Networking for Developers talk next week at Sun Tech Days in Boston. I'll be there too, talking about S10 and OpenSolaris security features, as well as being in the audience for Nicolas's talk.

Not using "ncp" on Niagara considered harmful

One of our IPsec remote-access servers here is on a Niagara-powered T2000 server. It's really overkill for the job, but we get to see how IKE and the Niagara crypto accelerator (known as "ncp" by its driver name) interact.

The nice thing about running your own stuff is you find things out before others do. Consider bug 6339802. We saw AWFUL IKE performance on Niagara boxes before we fixed this. Admittedly, IKE is single-threaded (for reasons beyond the scope of this blog entry), but it was taking seconds to complete an IKE Phase I with 2048-bit RSA and 1536-bit Diffie-Hellman.

More recently, we've been enabling bigger Diffie-Hellman MODP groups in our IKE. The Niagara driver has a limit of 2048-bit operations, so we limited the Phase I DH to 2048-bits.

Here's a DTrace script we like to use to measure responder-side Phase I times (in.iked and libike are closed-source, but trust me on this one):
#!/usr/sbin/dtrace -s

/*
 * Responder-side Phase I setup.
 */
pid$1::ssh_policy_new_connection:entry
{
        self->negstart[arg0] = timestamp;
        printf("Initial packet received, pm_info = %p", arg0);
}

pid$1::ssh_policy_negotiation_done_isakmp:entry
{
        /* Use 16384 value for "CONNECTED" from isakmp_doi.h */
        printf("return %d - %s ", arg1,
            (arg1 == 16384) ? "Success" : "Error case.");

        printf("pm_info %p finished, took %d ns", arg0,
            timestamp - self->negstart[arg0]);
}
With the fix for 6339802 in place, we can get pretty good phase 1 times....

dtrace: script '/space/responder-phase1.d' matched 4 probes
CPU     ID                    FUNCTION:NAME
  4  48040  ssh_policy_new_connection:entry Initial packet received, pm_info = bed6c8
  4  48042 ssh_policy_negotiation_done_isakmp:entry return 16384 - Success pm_info bed6c8 finished, took 165512764 ns
That's 165msec. Some of that time is packet round-trips, but let's ignore that now for this exercise.

Now let's do someting drastic: cryptoadm disable provider=ncp/0 all. Suddenly those seconds come back...


4 48040 ssh_policy_new_connection:entry Initial packet received, pm_info = bed6c8
4 48042 ssh_policy_negotiation_done_isakmp:entry return 16384 - Success pm_info bed6c8 finished, took 7732419300 ns


WOW! Like Darren said, that's blog-worthy, and that's why I'm here.

So why is Niagara so slow without its crypto accelerator?


That's a good question. Keep in mind that four big-number operations occur in an IKE Phase I exchange like I measured above: one RSA Signature, one RSA Verification, one Diffie-Hellman generate, and one Diffie-Hellman agree. I'm not 100% sure, but I believe the default software implementation of big-number operations on SPARC uses floating-point tricks to help out. Using floating-point on a Niagara kicks in a software emulation, which would definitely increase the time taken for each bignum operation.

So the moral of the story is to make sure you're exploiting all of the hardware that's available to you!

"Adaptation Issues", and a Neuromancer movie?

Ahh what to do while test runs finish? Sometimes I get additional real work done, but when it gets close to a holiday, or just the end of a day, I visit Movie websites.

Coming Soon recently quoted an article from Variety about one of my favorite novels - William Gibson's Neuromancer getting a producer for a (long-overdue, IMHO) movie.

Now granted, some of the book hasn't aged very well (Gibson's introduction to a recent re-issue asks about the lack of cell phones), but some of it would still translate VERY nicely to the big screen. Of course, the last major-release attempt to bring one of his works to the big screen was a dismal failure, story-wise, even though (again IMHO) the look-and-feel was close.

Another movie site I visit introduced me to a phrase - "adaptation issues". Basically, if one has enjoyed a story in one form, one may have problem what that story is retold in another form. Ask die-hard (insert-book or comic here) fans about how (book or comic)'s movie version just messed things up SO BADLY. Often, some of their criticisms turn out to be well-based, other times, it's just acute fan{boy,girl} nitpicking. One's adaptation issues seem to be tied to how beloved the original source material is by someone. My only real experience with adaptation issues was the reaction I had to Johnny Mnemonic, where I felt like there was a wonderful opportunity that had been squandered. I really hope Neuromancer's movie adaptation doesn't leave me feeling the same way.

To that end, I'm going to resurrect a conversation I had with a former movie-site writer about who should be cast in a Neuromancer movie. Hey Widge -- care to revisit that cast again?

Unnumbered interfaces confuse Quagga

The whole reason I was reading e-mail on a Sunday was not to look for telnetd exploits.

I was logged in because Team IPsec runs its punchin IPsec remote-access server (sometimes called a VPN server, but I hate that term because it's pushed by too many middlebox vendors) which was having routing problems.

As stated before, Solaris implements tunnels as point-to-point interfaces. For a remote-access server like we have in punchin, this means every external IP address gets a tunnel interface. (Until we had Tunnel Reform, this meant only one client per external IP address, which messed up NATs for multiple clients.) A tunnel interface has two addresses - a local one and a remote one. The local one can be shared with other tunnels or even with a different local interface (like the local ethernet). Such interfaces are called unnumbered interfaces.

A remote access server does forward packets, and is therefore by definition a router. One of our servers just swapped out Zebra (from older OpenSolaris/Nevada build) to Quagga. We use Quagga's OSPF to learn the topology of the Sun internal network (the SWAN).

As clients "punch out", their tunnel gets destroyed. Now each of these tunnels shares the same local IP address with our ethernet to the SWAN. Unfortunately, these "interface down" events confuse Quagga, and suddenly all of my punchin clients can't move bits to the internal network anymore.

There is a workaround, and that's to assign a different local IP address than the one that is directly connected to the SWAN for use with all of the client tunnels. It's not that painful, as I only lose one out of 256 possible client addresses (our engineering ones only have a /24 from which to allocate client addresses). Still, as an esteemed colleague said, "I hope that's not the *whole* solution."

It isn't, and I would like to ask the Quagga community (as I've already asked our local routing folks, Paul Jakma and Alan Maguire) to make sure that Quagga and its routing protocols play nicely with unnumbered interfaces. It'll allow me to plumb tunnels until I'm all out of address space! :)

This entry brought to you by the Technorati tags , , and .

How OpenSolaris did its job during this telnet mess

I don't have a tag for general Security because dammit, I'm still a networking person who works on security! (UPDATE: I'm wiser now in 2020 and have added one.)

Anyway, you've seen elsewhere about how Alan H. turned around the S10 fix as quickly as he could. I'm going to tell you how Alan already found this:


D 1.67 07/02/11 19:46:41 danmcd 90 89 00009/00010/04896
6523815 LARGE vulnerability in telnetd


when he went to file a bug that'd already been putback into Nevada/OpenSolaris.

The best place to see what happened is to visit the OpenSolaris discussions, especially this thread.

I was reading e-mail on a Sunday because of an operations problem I was having with one of our punchin IPsec remote access servers. (I'll discuss the problem, a routing one, in a followup entry later today.) I found the initial note and read the PDF file to which "skunsul" so graciously provided a link. MAN I was embarassed. After trying it on some lab machines and my laptop, I brought up the in.telnetd source (at the line number provided by Kingcope). My first approach was to verify the content of the $USER environment variable fed to in.telnetd. I compiled-and-ran the fix, which seemed to work. Great! Time to find some code reviewers.

My only regret about this was not putting the review on security-discuss@opensolaris.org or networking-discuss@opensolaris.org. I'll try better next time, especially for something that was announced on an opensolaris list initially. Anyway, two reviewers (OpenSolaris board member and well-known Sun Good Guy Casper Dik, and crypto framework expert Krishna Yenduri) suggested that login(1) is already getopt-compliant, and that I should just pass "--" between the rest of the arguments and the contents of $USER, no matter how *&^$-ed up it is. Because it was a Sunday, I didn't get rapid turnaround on e-mail replies. This is why the putback didn't happen until six hours after I'd read the note from skunsul. Krishna also recommended (in the spirit of open development) that I place the diffs on the very thread, and I did just that.

Anyone I know here who happened to have seen the initial note would've jumped on this in the same way - please don't think I did something others wouldn't do. My point is - this is the first security exploit reported to us via OpenSolaris, and I think the "Open" part of OpenSolaris helped out the code, as well as Sun's customers.

This entry brought to you by the Technorati tags and .

Tunnel Reform Code Review starts now.

Hey everyone!

The IPsec Tunnel Reform project's code review is now underway. Take a look and see what it took to bring up IPsec Tunnel-Mode processing in a world where tunnels are not actions from a policy, but rather a first-class network interface (or at least after Clearview it will be).

Highlights for administrators include:

  • Augmentiations to ipsecconf(1m) to specify a tunnel interface's policy, whether it's S9-style IP-in-IP transport mode, or RFC 2401-compliant Tunnel Mode.

  • No changes to IKE configuration.

  • You can configure tunnel security without ifconfig(1m) using just ipsecconf(1m). We put all IPsec policy in ipsecconf(1m) and let ifconfig manage interfaces (and route(1m) manage routing).

  • Additions to ipseckey(1m) for manual tunnel-mode SA configuration, or monitoring of kernel interactions with Key Management.

  • Better interoperability with everyone else's Tunnel Mode IPsec.



Highlights for OpenSolaris-hackers include:

  • New per-tunnel policy structure: ipsec_tun_pol_t, which instantiates the existing policy-head per tunnel.

  • Getting rid of IRE_DB_REQ messages for SA addition/updates. This improves SA-adding performance and reduces the complexity of the ESP and AH modules.

  • New PF_KEY and PF_POLICY messages to reflect Tunnel Mode.

  • Shifting of tunnel IPsec policy enforcment from the lower-instance of IP to "tun" itself. (NOTE: This will change again when we merge with Clearview.)



Share your comments on tref-discuss, and let us know what you think!

This entry brought to you by the Technorati tags , , and .

Tunnel Reform now open for your perusal

NOTE: Links here point to docs that no longer exist. Maybe the the Internet Archive might have 'em?

IPsec in Solaris has one missing piece, and we're about to put it in place.

The IPsec Tunnel Reform project aims to give Solaris and OpenSolaris an RFC 2401-compliant tunnel-mode implementation.

There's a lot of changes in the source base, some of which aren't open sourced (IKE), but most of which are in existing OpenSolaris code. The project page has a webrev showing the changes thus far. We're trying to be more open in our development processes here in the Solaris group, and showing you Tunnel Reform before we've finished it, AND before we've started major test efforts, is Team IPsec's own way of contributing to this openness.

Think of the source snapshot as a "Code Preview" instead of a "Code Review". There's a newly-rewhacked design document there too, and we'd like you to look at it and discuss it on the OpenSolaris communities or the tref-discuss@opensolaris.org mailing list.

And once we're done with this, we can think about RFC 4301 (2401's replacement) and friends, more zones support, SMF-izing things, giving TX labelled SA support... :)



This entry brought to you by the Technorati tags , , and .

ESP without authentication considered harmful

Hopefully you will read this and go "That's obvious". I'm writing this entry, however, for those who don't.

When IPsec was being specified over 10 years ago, attacks against cipher-block-chaining (CBC) encryption were understood. ESP has an authentication algorithm because AH had a vocal-enough opposition to merit having packet integrity in ESP also (there are also performance arguments for ESP-auth).

Now there actual attacks with actual results. Kenny Paterson and Arnold Yau have published a paper with attacks against no-authentication ESP Tunnel Mode. I believe some of the techniques can also be employed against Transport Mode as well, but again, only with no authentication present.

The simple solution, of course, is to employ your choice of ESP Authentication (encr_auth_algs in ipsecconf(1m) or ifconfig(1m)) or AH (auth_algs in ipsecconf(1m) or ifconfig(1m)) with your IPsec deployment. We warn users about such configurations with ifconfig(1m) today. There is an RFE to eliminate or make very difficult encryption-only configurations in Solaris. Maybe someone in the OpenSolaris community would like to take a stab at it?

Dan's blog is powered by blahgd