Kebe Says - Dan McDonald's Blog

End-to-end Research Group is ending

Let me quote BBN's Craig Partridge on the Internet Research Task Force's end2end-interest mailing list:
Dear Friends and Colleagues:

After 26 years, the End-to-End Research Group has decided to cease existence
as of January 1st, 2010. While there is certainly still end-to-end research
to be done, the group had ceased to effectively serve as a forum for those
discussions.

The E2E group had a great run, serving as a place where many researchers
could bring their ideas for initial, informal, airing. The meetings could be
bruising. (At one meeting, a member tried to encourage a speaker by saying
"We're all friends here" only to pause and say, "No, I'm sorry, actually we
eat our young, but proceed anyway"). But the meetings usually also brought
insights.

Ideas that were tested in E2E meetings include slow start and improved
round-trip time estimation, Random Early Drop, Integrated and Differentiated
Services, Weighted Fair Queuing, PAWS, and Transaction TCP.
When I learned about the group (and their enlightening e-mail list), my networking professor described it as covering, "End to end, and everything in between..." Now you half-dozen readers know the exact origin of my previous (was "this") blog's name.

Luckily, the mailing alias will still be around. Still, the cliche, "End of an era," really applies here. It's yet another sign of the Internet's maturity, and that the really new places for research are probably somewhere not a lot of people are examining.

Anyone else have something to say about the End-to-End Research Group going away?

FiOS vs. Comcast?

Hello you half-dozen readers! The simple question is stated above.

The full question is a bit more complicated. I currently am a mostly-satisfied Comcast business-class customer in a place with no other options. I will be moving to a place where I can choose between Verizon FiOS and Comcast business-class. My initial forays suggest I can keep my 6Mbit/1Mbit with /29 prefix of static IPs, or pay $20-30/month more and push it up to 20Mbit/5Mbit. Having 5Mbit out is very tempting, but apart from making my family download pictures from www.kebe.com faster, I'm not sure it's worth it.

Comcast won't bundle consumer TV and Phone with business-class Internet, will FiOS? And I have some technical issues with my /29 static IP block with Comcast. Will I have these the same with FiOS? Will I even be able to get a /29 with FiOS?

Any clues are, as always, welcome!

Endian-independence -- NOT just for kernel hackers

Yesterday on Facebook, OpenSolaris community member Stephen Lau said:
thought i was done caring about endianness when i left kernel programming... oops
I quickly replied:
You put bits on a {network,disk} that transcend architectures, you worry about byte-order.
I've often wondered why people with apps for Solaris on SPARC are often concerned about getting it to work on Solaris for x86 and vice-versa. Seeing Stephen equate byte-order-sensitivity to kernel-hacking suddenly made me realize the problem: byte-order sensitivity is everyone's problem.

Any time your program puts a multi-byte value in a network packet, or a disk block, it is highly likely another program on a different byte-order platform will attempt to read that packet or disk block. Never mind the historical holy wars about byte-order, even today, there are enough different platforms that run both big and little-endian byte orders out there.

It's really not tough to write endian-independent code. The first thing you need to decide is how to encode your disk/network data. Most Internet apps use a canonical format (which is big-endian for things in RFCs). There have been some schemes to have a universally-encoded format (XDR or ASN.1), but these can often be big-and-bulky. OS research in the early 90s proposed a scheme of "receiver makes right", where a producer tags the whole data with an encoding scheme, and it is then up to the receiver to normalize the data to its native representation.

Regardless of encoding scheme, if you are reading data from network or disk, the first step is to normalize the data. Different architectures have different aids to help here. x86 has bswap instructions to swap big endian to x86-native little endian. SPARC has an alternate space identifier load instruction. A predefine alternate space (0x88) is the little-endian space, which means if you utter "lduwa [address-reg] 0x88, [dst-reg]" the word pointed to by [address-reg] will be swapped into [dst-reg]. The sun4u version of MD5 exploits this instruction to overcome MD5's little-endian bias, for example. Compilers and system header files should provide the higher-level abstractions for these operations, for example the hton{s,l,ll}() functions that Internet apps often use. After manipulating data, encoding should follow the same steps as decoding. Also, in some cases (e.g. TCP or UDP port numbers), the number can often just be used without manipulation

Some have called for compiler writers to step up and provide clean language-level abstractions for byte-ordering. I'm no language lawyer, but I've heard the next revision of Standard C may include endian keywords:

        /*
         * Imagine a UDP header with language support!
         */
        typedef struct udph_s {
                big uint16_t uh_sport;   /* Source port */
                big uint16_t uh_dport;   /* Destination port */
                big uint16_t uh_ulen;    /* Datagram length */
                big uint16_t uh_sum;     /* Checksum */
        } udph_t;
Today, these fields need htons() or ntohs() calls wrapping references to them. Of course, there would be a lot of (otherwise correctly-written) existing code that would need to be rewritten, but such a type-enforced scheme would reduce errors.

Finally, one other cause of non-portable code is doing stupid tricks based on how multi-byte integers are stored. For example, on little-endian boxes:

        /* This won't work on big-endian boxes. */
        uint32_t value = 3;
        uint32_t *ptr32 = &value;
        uint8_t *ptr8 = (uint8_t *)&value;

        assert(value == *ptr8);  /* Barfs on big-endian... */
People micro-optimize based on such behavior, which limits such code to little-endian platforms only. A compiler can exploit the native platform's representation to make such optimizations redundant, and any compiler guys in the half-dozen readers can correct or confirm my assertion.

New IPsec goodies in S10u7

Hello again. Pardon any latency. This whole Oracle thing has been a bit distracting. Never mind figuring out the hard way what limitations there are on racoon2 and what to do about them.

Anyway, Solaris 10 Update 7 (aka. 5/09) is now out. It contains a few new IPsec features that have been in OpenSolaris for a bit. They include:
  • HMAC-SHA-2 support per RFC 4868 in all three sizes (SHA-256, SHA-384, and SHA-512) for IPsec and IKE.
  • 2048-bit (group 14), 3072-bit (group 15), and 4096-bit (group 16) Diffie-Hellman groups for IKE. (NOTE: Be careful running 3072 or 4096 bit on Niagara 1 hardware, see here for why. Niagara 2 works better, but not optimally, with those two groups.
  • IKE Dead Peer Detection
  • SMF Management of IPsec. Four new services split out from network/initial:
    • svc:/network/ipsec/ipsecalgs:default -- Sets up IPsec kernel algorithm mappings.
    • svc:/network/ipsec/policy:default -- Sets up the IPsec SPD (reads /etc/inet/ipsecinit.conf).
    • svc:/network/ipsec/manual-key:default -- Reads any manually-added SAs (reads /etc/inet/secret/ipseckeys).
    • svc:/network/ipsec/ike:default -- Controls the IKE daemon.
  • The UDP_NAT_T_ENDPOINT socket option from OpenSolaris, so you can develop your own NAT-Traversing IPsec key management apps without relying on in.iked.
We've even more goodies in OpenSolaris, BTW.

DOH! Ekiga.net MAILS your password back to you

Make sure you don't pick a good password for ekiga.net -- they mail it back to you IN THE CLEAR in an e-mail message.

I'm so furious, I can't even begin to describe it. Did I miss fine-print on their page saying they'd do something this stupid?

UPDATE: They also store your password in the clear on-disk. Check out ~/.gconf/apps/ekiga/protocols/%gconf.xml if you wanna see it in all of its cleartext glory!

Dear Santa... Steve... Tim Cook - A 64GB iPhone, please?

Waiting for three test boxes to install never helps one's concentration.

So tell me -- when are flash chips going to shrink small enough to allow a 64GB iPhone? Yes, I said it: "64GB iPhone." Not, "64GB iPod Touch," but, "64GB iPhone." I understand that the Touch has two slots for flash, where as the iPhone only has one (the Phone chips take up the Touch's other slot space). But quite honestly, I own a working phone (RAZRv3c, GSM) and a half-full 80GB iPod, and would jump at the opportunity to reduce these two devices into one. It's probably just a matter of time (and money, at least initially), but since... yep, all three machines are still installing, I'm going to vent to the half-dozen or so readers.

And while I'm at it, perhaps any iPhone fan{,atic}s in the audience can confirm or deny:
  • With the advent of the App Store, is there a working voice-dial app that I can use with an existing (cheap Motorola) bluetooth headset? Preferably where I only need to touch said headset and shout to make a call?
  • How painful are rolling your own ringtones?
  • (For Bonus points)Is there an non-jailbreak way of getting a terminal program running?
I suspect the answers are, "Sorta", "Annoying-but-doable", and "Yeah, right", respectively. I've done a little googling already, but the more clues I have, the merrier.

Any clues are, as always, welcome. Looks like one of those test boxes is finishing up. Time for test setup...

Way to go, Disney!

I just read an article stating that Disney will be including a regular DVD with several Blu-Ray releases. As both a new Blu-Ray owner, and a parent of two five-year-old Disney fans, I'm pretty thrilled about this. Our SUV/crossover has a DVD player for long trips (our rule: one-way Trips of >= 1 hour only), and we were curious about what we were going to do if/when we started moving to Blu-Ray for our purchases. I don't know if this will affect Pixar releases (our Wall-E, for example, has no DVD copy), but it'll be nice knowing that at least for some purchases, we get the hi-def version AND one we can play on those >= 1 hour car trips.

How to tell when a performance project succeeds?

The Volo project is an effort to improve the interface between sockets and any socket-driven subsystems, including the TCP/IP stack. During their testing, they panicked during some IPsec tests. See this bug for what they reported.

In our IPsec, we have LARVAL IPsec security associations (SAs). These are SAs that reserve a unique 32-bit Security Parameters Index (SPI), but have no other data. If a packet arrives for a LARVAL SA, we queue it up, so that when it gets filled in by key management, the packet can go through. We do this because of the IKE Quick Mode Exchange, which looks like this:

        INITIATOR                               RESPONDER
        ---------                               ---------

        IKE Quick Mode Packet #1  ------->

                                <----------     IKE Quick Mode Packet #2

        IKE Quick Mode Packet #3  -------->
Now once the initiator receives Quick Mode packet #2, it has enough information to transmit an IPsec-protected packet. Unfortunately, the responder cannot finish completing its Security Association entries until it receives packet #3. It is possible, then, that the initiator's IPsec packet may arrive before the responder has finished processing IKE. Let's look at the packets again:
        INITIATOR                               RESPONDER
        ---------                               ---------

        IKE Quick Mode Packet #1  ------->

                                <----------     IKE Quick Mode Packet #2

        ESP or AH packet        ---------->     Does this packet...

        IKE Quick Mode Packet #3  -------->     ... get processed after my
                                                receipt of #3, also after
                                                which I SADB_UPDATE my
                                                inbound SA, which changes it
                                                from LARVAL to MATURE?

Now the code that queues up an inbound IPsec packet for a LARVAL SA is sadb_set_lpkt(), as was shown in the bug's description. It turns out there was a locking bug in this function - and we even had an ASSERT()-ion that the SA manipulated by sadb_set_lpkt() was always larval. The problem was, we discounted the possibility of IKE finishing between the detection of a LARVAL SA and the actual call to sadb_set_lpkt().

The Volo project improved UDP latency enough so that the IKE packet wormed its way up the stack and into in.iked faster than the concurrent ESP or AH packet. The aformentioned ASSERT() tripped during Volo testing, because we did not check the SA's state while holding its lock. Had we, we could tell that the LARVAL SA was promoted to ACTIVE, and we could go ahead and process the packet.

This race condition was present since sadb_set_lpkt() was introduced in Solaris 9, but it took Volo's improved performance to find it. So hats off to Volo for speeding things up enough to find long-dormant race conditions!



Addendum - IKEv2 does not have this problem because its equivalent to v1's Quick Mode is a simpler request/response exchange, so the responder is ready to receive when it sends the response back to the initiator.

Racoon2 on OpenSolaris - first tiny steps

NOTE: A version of this was sent to the racoon2-users alias also.

I've been spending some of my time bringing up racoon2 (an IKEv2 and IKEv1 daemon) on OpenSolaris.

Because of vast differences in PF_KEY implementations between OpenSolaris and other OS kernels, I've spent my racoon2 time actually getting IKEv1 to work first, instead of IKEv2. Right now, what's working is:
  • IKEv1 initiates and derives IPsec SAs for single-algorithm IPsec policies.
That's it! IKEv1 responder needs work, as does all of IKEv2, as does work
for multiple-choice of algorithms. But there's enough change in there to say
something now.

ARCHITECTURAL DIFFERENCES

The most noteworthy change in the OpenSolaris work so far is that literally there's no spmd (a separate IPsec SPD daemon racoon2 uses) required for now. This is because:
  • We don't have the indirection between ACQUIREs and the appropriate policy entry. Our extended ACQUIREs contain everything needed to construct a proposal. There's no SPD consultation required with an OpenSolaris ACQUIRE.
  • Our responder-side logic uses inverse-ACQUIRE, which will provide the same structure as ACQUIRE w.r.t. proposal construction. This is the closest we get to needing something like spmd, and given its syntactic equality to an extended ACQUIRE, we can use it on rekeying if the responder initiates the next time.
If spmd serves another purpose, we will revisit it. As it stands, however, I cannot see us using it.

CODE DIFFERENCES

In OpenSolaris, we use the "webrev" tool to generate easy-to-review web pages with diffs of all varieties. The webrev for what I have so far in racoon2 is available at:
http://cr.opensolaris.org/~danmcd/racoon2-opensolaris/
Feel free to make comments or suggestions about what I've done.

How to rescue data from an iBook with thermal problems

My wife Wendy has had her iBook G4 for not-quite four years now. We had to return it once before via AppleCare due to thermal problems. Well, the thermal problems are back, and this time, there's no AppleCare for us to invoke. I managed to get the machine to behave itself only after leaving it powered off for a bit, but then it would lock again. I'd heard stories about putting computers in refrigerators to keep them cool enough to run. I never thought I'd try it myself.

We do, however, have a freezer in the basement. So check this out:

Brrrr

I managed to get Wendy's home-directory off, and that's what mattered. I'm heading off to the Apple Store to get a new MacBook (thank goodness for the just-arrived George-and-Nancy "Will you be my friend with this stimulus?" check). I hope to do the frozen data transfer one more time to bootstrap the new MacBook.

Dan's blog is powered by blahgd