Kebe Says - Dan McDonald's Blog

I Have No Whistle to Blow, But I Must Scream

I'm sure all twelve of you readers out there know what's been going on with respect to recent revelations about NSA activity. Among other things is the unnerving discovery that NSA has been attempting to actively dumb-down security for the Internet.

In the second linked article, Bruce Schneier calls upon people to blow the whistle on, "how the NSA and other agencies are subverting routers, switches, the internet backbone, encryption technologies and cloud systems." Here's the deal:

I have never been asked to introduce back-doors or weaken security in the Solaris, OpenSolaris, Oracle Solaris 11 (for the four months I worked on it post-barn-door-closing), or Illumos. If there are weaknesses there, it was not because of any deliberate effort on my part.

You can view the kernel IPsec protocol sources (AH & ESP) here, by looking at ipsec*.c, sadb.c, spd.c, spdsock.c, keysock.c and header files in the directory above it. You can see the IPsec management utilities here. According to at least one well-known security researcher, the Illumos (nee OpenSolaris) IPsec code isn't bollocks.

There is no open-source for IKE, because the libike.so.1 library was mostly OEM code, from a vendor whose technical lead let me co-write an RFC with him. You can use the various observability and debugging tools in Illumos to see how things work, however, if you wish.

If you want to write your own, better, key management application for Illumos (or even Oracle Solaris), you can use PF_KEY to control the IPsec SADB. I detail the subsequent additions to RFC 2367 on my day-one-of-OpenSolaris blog post. If you want to work on IPsec in totally-open-source Illumos, you have my blessing, and I'll definitely be reviewing (and maybe integrating if you pass code reviews) your code.

Broad-Spectrum Dogfooding, or Why I Miss Jurassic.

NOTE: I imported this from blogspot and the embedded tweet was nicely there. Not sure if other self-hosted entries will be that cool.

I think most of you dozen readers know what I mean, when I refer to dogfooding. Some people think of Microsoft when they hear the term, but I first heard it from the same person via his being a Sun customer, AND via my old roommate, who worked for him.

I saw this Tweet last week:

I then checked out the blog post. It dealt with how an iSCSI LAN can be a failure point, partially due to the weakness of the ones-complement TCP/IP checksum

Reading this reminded me of an old bug we found in Sun with either NFS or an ethernet device driver, and the only way we caught it was by using IPsec (AH particularly) and seeing packets fail the authentication check. The corrupt NFS packets had 16-bits worth of 1 (0xffff), where it should have had 16-bits worth of 0 (0x0000). Using the standard TCP/IP checksum, there's no difference between those two values, no matter where they fall in the packet. Using IPsec, however, even with HMAC-MD5, showed the packet failure clearly when the packet authentication check failed. This bug wouldn't have been discovered were it not for the Solaris Team's big honking server, jurassic, and how its multiple concurrent uses interacted with each other.

Even before there was OpenSolaris, people knew about jurassic. Solaris people's (not any old Sun people... Solaris people) posts on IETF mailing lists often showed user@jurassic. Jurassic served as the NFS source of home directories, and until the early 2000s e-mail inboxes as well. Every two weeks the in-development Solaris build would be placed upon jurassic. As a Solaris developer, if your changes broke jurassic, you fixed those changes immediately, or risked getting your changes yanked out. Not breaking jurassic was a great motivator for code quality. Also, if you had a new feature, you wanted it used on jurassic, even if not by everyone.

Once the basic IPsec protocols - AH & ESP - went into Solaris 8, I convinced the jurassic maintainers to protect all traffic between jurassic and a couple of workstations. One was mine, naturally. I encrypted all of my traffic to jurassic. Since we only had 100Mbit in our building at that time, the performance hit wasn't too bad, relatively speaking. Another belonged to an NFS developer, who I'd somehow convinced to run AH, because I was already running ESP (and AH used less cycles for protection). It was this NFS developer, surprised he wasn't getting data corruption while other were, who helped suss out the bug in question.

At this point, I'd like to have a moment of silence for all of the made-public Solaris information that Oracle has since put back in its box. I could've had a bug id here, folks, A REAL BUG ID!!!

So for a few of us, jurassic also served as an IPsec testbed. It also was helpful in determining that nobody else's cleartext performance dropped while a few of us were running with network traffic (put more succinctly, connection policy latching worked). Other services would run on jurassic as well: DNS, IMAP, and others I'm sure I'm forgetting. Jurassic core dumps eventually would be used to test out the then-new mdb (oh, those early ::findleaks results...), and I'm sure more than a few DTrace scripts helped diagnose some jurassic-discovered bugs.

At Nexenta, we make a dedicated storage appliance. Naturally, we use them inside where appropriate. We Nexentians (especially the ones in Lowell) use Illumos from other distributions for even greater effect. My Illumos Home Data Center talk touches upon these at about 10:43 in. We use Illumos to host VMs (Thank you Joyent), we use it for site-to-site VPNs, we will be using it for public services at some point, and everything I mentioned all runs on Illumos. It's not quite the magnifying glass Jurassic was, but we do what we can.

I believe Oracle still has jurassic around, I know it did prior to my 2011 departure. I suspect it's helping Oracle Solaris even today. I suspect, however, that a less dense, but more widely instantiated broad-spectrum dogfooding continues on in Illumos today.

Delegated ZFS, cloning, and SCM

Well THAT was a long break from blogging...

One of the things that's happened in the illumos community is a subtle shift of the main illumos source repository from being primarily Mercurial to being primarily Git. This means I've had to learn Git. At first, I wasn't sure why people were so rabidly pro-Git. I found one of the big reasons:


everywhere(~/ws)[0]% /bin/time git clone git-illumos git-illumos.copy
Cloning into git-illumos.copy...
done.

real 11.8
user 4.7
sys 3.2
everywhere(~/ws)[0]% /bin/time hg clone illumos-clone illumos-clone.copy
updating working directory
44332 files updated, 0 files merged, 0 files removed, 0 files unresolved

real 1:52.6
user 28.9
sys 25.4
everywhere(~/ws)[0]%

Wow! Yeah, I can see why this would appeal to people. I'm still using Mercurial in a fair amount of places, both for my illumos work and for Nexenta as well. I should show one other thing that both SCM cloning operations do: take up disk space.


everywhere(~/ws)[0]% zpool list
NAME SIZE ALLOC FREE EXPANDSZ CAP DEDUP HEALTH ALTROOT
rpool 298G 198G 100G - 66% 1.00x ONLINE -
everywhere(~/ws)[0]% /bin/time git clone git-illumos git-illumos.copy

*** SNIP! ***

everywhere(~/ws)[0]% sync
everywhere(~/ws)[0]% zpool list
NAME SIZE ALLOC FREE EXPANDSZ CAP DEDUP HEALTH ALTROOT
rpool 298G 198G 99.6G - 66% 1.00x ONLINE -
everywhere(~/ws)[0]% /bin/time hg clone illumos-clone illumos-clone.copy

*** SNIP! ***

everywhere(~/ws)[0]% sync
everywhere(~/ws)[0]% zpool list
NAME SIZE ALLOC FREE EXPANDSZ CAP DEDUP HEALTH ALTROOT
rpool 298G 199G 98.7G - 66% 1.00x ONLINE -
everywhere(~/ws)[0]%

I believe Git will also take up less disk space, but still, that's approximately half a gig or more for an illumos workspace. If it's populated, say with a preinstalled proto area and compiled objects, that'll be even larger.

Consider one of the great strengths of ZFS: its copy-on-write architecture. Take a local, on-disk master repo, say one you're pulling directly from the source, and make it its own filesystem. Child/downstream workspaces from your on-disk master now can be created using low-latency ZFS operations. Only two problems need to be solved: non-privileged usage, and SCM correction to properly designate the parent/child or upstream/downstream relationship.

Another useful ZFS feature is administrative delegation. Put simply, an administrator can allow an ordinary user to perform selected ZFS primitives on a given filesystem, and its descendants in the ZFS filesystem tree. For example:


everywhere(~)[0]% zfs allow rpool/export/home/danmcd
everywhere(~)[0]% zfs allow rpool/export/home/danmcd/ws
---- Permissions on rpool/export/home/danmcd/ws ----------------------
Local+Descendent permissions:
user danmcd clone,create,destroy,mount,promote,snapshot
everywhere(~)[0]%

I (as root) delegated several permissions for a subdirectory of $HOME to me (as danmcd). From here, I can create new filesystems in ~/ws, as well as destroy them, clone them, mount, snapshot, and promote them. All of these are useful operations. The syntax for delegation is mostly straightforward: zfs allow -ld clone,create,destroy,mount,promote,snapshot rpool/export/home/danmcd/ws. The -ld flags enable local and descendant permission propagation.

First thing I did was zfs create rpool/export/home/danmcd/ws/illumos-clone, followed by hg clone ssh://anonhg@hg.illumos.org/illumos-gate illumos-clone. This populates my local Mercurial illumos repo. I can perform a similar operation with git. Per my above timing examples, I did so with git-illumos.

I wrote a script to clone, promote, and reparent Git and Mercurial workspaces using ZFS operations. It's called zclone and it's here for download. It's still a work in progress, and I'd like to maybe have it end up in usr/src/tools in illumos-gate someday. (I'll try and update this particular post as things evolve.)

Check out the times, and the disk space (not) used:


everywhere(~/ws)[0]% zpool list
NAME SIZE ALLOC FREE EXPANDSZ CAP DEDUP HEALTH ALTROOT
rpool 298G 198G 100G - 66% 1.00x ONLINE -
everywhere(~/ws)[0]% /bin/time zclone git-illumos git-illumos.zc
Created rpool/export/home/danmcd/ws/git-illumos.zc,
a zfs clone of rpool/export/home/danmcd/ws/git-illumos

real 1.0
user 0.0
sys 0.0
everywhere(~/ws)[0]% /bin/time zclone illumos-clone illumos-clone.zc
Created rpool/export/home/danmcd/ws/illumos-clone.zc,
a zfs clone of rpool/export/home/danmcd/ws/illumos-clone

real 1.0
user 0.0
sys 0.0
everywhere(~/ws)[0]% zpool list
NAME SIZE ALLOC FREE EXPANDSZ CAP DEDUP HEALTH ALTROOT
rpool 298G 198G 100G - 66% 1.00x ONLINE -
everywhere(~/ws)[0]%

These are constant-time operations, folks. And like I said earlier, I suppose its possible to have the local master repos populated with pre-compiled objects, header files in proto areas (an illumos build trick), and other disk-intensive operations pre-performed.

A quick search didn't yield me any results in this area: using ZFS to help make source trees take up less space. I'm surprised nobody's blogged about this or documented it, but I may have missed something. Either way, it doesn't hurt to mention it again.

On SOPA and PIPA

I can't say anything you haven't heard my tech friends say already on the subject. I can, however, quote this, because it's both funny and true:

"I think we need to drive a stake into this thing's heart, fill its mouth with garlic, cut off its head, expose it to sunlight and then throw the ash into a running body of water. It is vital that people not let up on the pressure merely because they appear to compromise."

Thank you Perry, for eloquently stating what should be SOPA's and PIPA's fates.

A Tale of Two Soccer Websites (A Security Story)

(Pardon the latency on this post. I had it in the Drafts section for a while.)

When a website requires a password for registration, said site SHOULD NOT EVER mail you back the password in the clear in an e-mail. Let me repeat that... SHOULD... NOT... EVER.

One of my daughters plays soccer, and has for two towns. My whole family enjoy seeing the Boston Breakers play soccer too. Both my daughter's town website (outsourced to Blue Sombrero) and the Boston Breaker's ticketing website (run by PMI ticketing using TicketSocket's technology) made the aforementioned mistake. Both of them, quickly addressed the issue with direct and up-front e-mails. I believe Blue Sombrero addressed the problem a bit quicker, but that's because of a combination of smaller organizations and the Breakers' mistake happening on a weekend.

The Blue Sombrero handling of my daughter's old town website mistake was quick, and without incident. Hats off (no pun intended) to the Blue Sombrero folks, who I hope have implemented the no-mailing-passwords policy throughout their entire customer base.

One bad thing someone in the Breakers organization did was remove my original complaining posts on Facebook. I suspect this was merely the case of panic and not active malice. The General Manager of the Breakers, Andy Crossley, sent me a mail on Saturday to see what was going on. Once he understood the problem, he got the relevant technical folks involved, and they solved things.

While I'm glad to see quick turnaround on these flaws, the one piece of advice I will reiterate is NEVER SEND OUT CLEARTEXT PASSWORDS. Thank you.

Finding Ada, but with better technology examples!

I found out thanks to Denny Gentry about Ada Lovelace Day today. Denny has a great blog post citing three engineers and their work with ATM.

The three engineers are wonderful examples of excellence, ones I'd gladly mention. What bugs me is that he cited... ewww.... ATM. His third paragraph mentioned why I go, "ewww..." over ATM. He didn't have to deal with (I think) some of the politics of ATM zealots, but that doesn't take away from Allyn's, Sally's, or Renee's abilities or contributions.

In fact, it's not difficult to cite further contributions from each of them... two of which I can further support with source code!

First off, Sally Floyd is well known for much TCP and congestion control goodness. If you followed the link to Sally's page you can see all (or at least most) of her work for yourself. I unfortunately don't know of any quickly-linkable code to cite, but I'll gladly accept suggestions.

Allyn Romanow was a engineer at Sun, and worked in my old group (Solaris Internet Engineering) while she was there. Her big contribution to the Solaris TCP/IP stack was the support for large, fast networks (aka. RFC 1323), which you can see scattered throughout the TCP code, particularly here.

Renee Danson (now Sommerfeld), also an engineer at Sun, escaped the world of ATM to join Internet Engineering later on. I was fortunate to have her land with Team IPsec for a while. As we were bringing up IKE for Solaris 9, I was hoping to have a command-line tool alter the running IKE daemon using the Solaris lightweight IPC mechanism known as doors. Renee made this happen. Because of a large OEM component, the IKE daemon source isn't available for browsing, but the control program, ikeadm(1M) is there for the world to see.

An unofficial IETF slogan was, "We believe in rough consensus and running code." I figured it's even better to find Ada with some running code to back it up.

MTV is 30, and do you remember PopClips?

MTV (Music Television... at least it used to be), is 30 today (or yesterday depending on how late I post this). I'm sure lots of people have written about this already. I'd recommend checking out this YouTube channel if you want a glimpse into the past. It has commercials inserted (which I believe weren't actually on MTV in those early days), but otherwise should stir some 1981 memories.

I'm here to write about a precursor to MTV that I remember seeing months before MTV appeared. Nickelodeon used to air a show on Sunday nights called "PopClips". Internet searching on it turns up very little. The Wikipedia article sums up all of my own recollections, and includes some tidbits that former Monkee Michael Nesmith produced the show.

Do any of you half-dozen readers who are approximately my age (40s) remember PopClips? I remember seeing some good videos on there that did eventually make their way to MTV (my favorite specifically from PopClips was "Walking on the Moon" from The Police). The amount of collective net data on PopClips is surprisingly sparse.

WRITE_SAME support now in Illumos COMSTAR

The WRITE_SAME primitive is now available in Illumos as of this push:
13382:d84aa76f7cd2 Dan McDonald 
937 WRITE_SAME support for COMSTAR
Reviewed by: Gordon Ross 
Reviewed by: Richard Elling 
Reviewed by: Robert Gordon 
Approved by: Gordon Ross  
Sumit Gupta wrote the original contribution, and after a bit of my own massaging, it's now in Illumos. Unlike the UNMAP push, this one did not have a lot of rewhacking (in large part due to its lower amount of direct interaction with ZFS).

The WRITE_SAME primitive works pretty much like its name. The iSCSI initiator passes in a WRITE_SAME primitive along with a single disk block. The iSCSI target then writes the same block over the range of logical block addresses specified in the command.

One set of experiments I did prior to integration was figuring out what size buffer to allocate for an I/O. In a perfect world, you don't want to do sbd_write() calls for every 512-byte block. On the other hand, you also don't want to force the kmem allocator to perform unholy tasks of allocation. I settled on a default of 128kbytes, which has a kmem_cache magazine backing it up (according to kmem stats). Users can experiment with this themselves by tweaking stmf_sbd's sbd_write_same_optimal_chunk variable. Every WRITE_SAME request, once it generates the data, consults this variable prior to allocating a block. Source-junkies can look here for the function in question.

Happy block-writing, folks!

Showing your kids the Star Wars films - which order?

WARNING: Spoilers for the Star Wars movies. Here's some old-school spoiler space...










We finally finished (in fits and starts) showing our twin 8-year-olds all six Star Wars films. We showed them in the order Wendy and I saw them on the big screen: 4 (Star Wars), 5 (The Empire Strikes Back, 6 (Return of the Jedi), 1 (The Phantom Menace), 2 (Attack of the Clones), and finally 3 (Revenge of the Sith). Now I'll admit we skipped over char-broiled Anakin and Vader's suit-fitting during Sith, but they're 8, what would you expect?

As we finished up, something occurred to me. I remember reading my favorite online-exclusive film critic (and fellow parent) Drew McWeeney mentioning toward the bottom of this article that he was going to show them to his son in a slightly different order: 4, 5, 1, 2, 3, 6.

That's a fascinating way to show them. At the end of The Empire Strikes Back the first-time viewer may have a question about whether or not Darth Vader is Luke's father. Why not, at that point, show the first-time viewer the story of Anakin Skywalker? This works especially well now, where the special-edition Empire uses Ian McDiarmid's Emperor Palpatine, and an astute child will notice how much Darth Sidious resembles him (or even Senator Palpatine).

Commenters (oh gotta love Internet feedback... makes me glad I only have a half-dozen readers) mention a few other orders: 1, 2, 3, 4, 5, 6 ("to get the crap out of the way"), or the flip-flop 1, 4, 2, 5, 3, 6 (tracking both in single steps).

There's a little part of me that wishes we tried the flashback-in-the-middle approach, but the only thing that matters is that our girls enjoyed the movies, and now they get one or two more of the jokes Wendy and I make.

For Illumos newbies: On developing small

I just finished a chat with a person who's doing a device driver, and he was worried that a certain header file wasn't available in his /usr/include. This struck me as odd, as I always get my headers from the workspace's proto area...

Then I realized I've had 15 years at Sun under my belt and this person's a complete newbie.

I haven't looked very closely at the Illumos build instructions, but I'm going to do some things now that will help kernel module writers (e.g. device drivers) get started without resorting to a full build right off the bat. I'll assume that you've installed the appropriate compilers and the "onbld" package so that you have a populated /opt/onbld/bin.

STEP 1: The /opt/onbld/bin/ws command:

When you go to work in an Illumos source base, your best off "entering it" via the ws command. I've hacked my .tcshrc to print a different prompt when I'm in with ws. Here, check it out:
everywhere(~)[1]% ws ws/to_mhi

Workspace                    : /export/home/danmcd/ws/to_mhi
Workspace Parent             : /export/home/danmcd/ws/illumos-clone
Proto area ($ROOT)           : /export/home/danmcd/ws/to_mhi/proto/root_i386
Parent proto area ($PARENT_ROOT) : /export/home/danmcd/ws/illumos-clone/proto/root_i386
Root of source ($SRC)        : /export/home/danmcd/ws/to_mhi/usr/src
Root of test source ($TSRC)  : /export/home/danmcd/ws/to_mhi/usr/ontest
Current directory ($PWD)     : /export/home/danmcd/ws/to_mhi

WS-everywhere-WS(~/ws/to_mhi)[0]% 
You'll notice a few things got set in the environment. What I use to alter my .tcshrc is the CODEMGR_WS variable. You should do the same in your favorite shell's config.

UPDATE: You will need to set SPRO_ROOT and BUILD_TOOLS after invoking ws. I do this already in my .tcshrc, but forgot to report it. A newer tool: bldenv, fixes this, but currently at the cost of a configuration file. There's talk of merging ws's simplicity with bldenv's completeness.

One of the key concepts in building Illumos is the "proto area". This is a version of the root filesystem that lives within your source tree. You'll see it set above. There's one per basic architecture type (i386 or sparc). When a full "nightly" build happens, the proto area gets populated with headers, libraries, commands, kernel modules, etc., and then the packaging tools sweep up their input from the proto area. The proto area contains more than what is on a running system.

You need to populate your proto area with basics (directory structures, etc.) to start.

WS-everywhere-WS(~/ws/to_mhi)[1]% cd $SRC
WS-everywhere-WS(usr/src)[0]% pwd
/export/home/danmcd/ws/to_mhi/usr/src
WS-everywhere-WS(usr/src)[0]% dmake sgs
       < Go get a drink of water or coffee, it's gonna be a bit... >
WS-everywhere-WS(usr/src)[1]% 
The "sgs" target sets up the proto area completely.

If you're proceeding to build, say, kernel modules, you should populate the kernel include files in the proto area.

WS-everywhere-WS(~/ws/to_mhi)[0]% cd usr/src/uts
WS-everywhere-WS(src/uts)[0]% dmake install_h
    < TONS of output deleted... >
WS-everywhere-WS(src/uts)[0]% 
UPDATE Fellow Illumos hacker Rich Lowe has informed me that "dmake setup" does both sgs and install_h in one fell swoop.

And then you can go and compile your kernel module. I'll use "ip" as an example:

WS-everywhere-WS(src/uts)[1]% cd intel/ip
WS-everywhere-WS(intel/ip)[0]% pwd
/export/home/danmcd/ws/to_mhi/usr/src/uts/intel/ip
WS-everywhere-WS(intel/ip)[0]% dmake
     < MORE output deleted... >
WS-everywhere-WS(intel/ip)[0]% 
If you want to lint-check your module, don't do the obvious "make lint" but instead do "make modlintlib". This will perform basic lint sanity without the overhead of a full crosscheck.

Now if you want to do something in userland, you'll need to do more than a simple header install. You MIGHT need to bringup libraries too, because it's possible your workspace's libraries have different versions than the machine you're actually building on.

WS-everywhere-WS(intel/ip)[0]% cd $SRC/lib
WS-everywhere-WS(src/lib)[0]% 
If you utter "dmake install", it's going to be a while. You can, if you know only a certain library was altered, cd into that library and utter "dmake install" in there. For example:

WS-everywhere-WS(src/lib)[0]% cd libipsecutil
WS-everywhere-WS(lib/libipsecutil)[0]% dmake install_h
     < output deleted... >
WS-everywhere-WS(lib/libipsecutil)[0]% dmake install
     < MORE output deleted... >
WS-everywhere-WS(lib/libipsecutil)[0]% 
Then you can go to, say, your new command, and start compiling and debugging there. Once you're done, you can exit this shell, and it will return you to your original pre-ws shell.

Hopefully this will lower some of the barriers to entry for budding Illumos hackers.

Dan's blog is powered by blahgd