Friday, July 25, 2008

Code Formatting Manifesto

First, let me say that there is only "one true format" for any given programming language. And, of course, that is my format. Then again, the one true format for you -- is your preferred format.

How do we solve this problem today? For any personal projects, one uses his/her one true format, as they are the masters of their own destiny. For shared projects, job related or otherwise, we often rely on code formatting standards.

Now, I'm not anti-coding standard by any means. I'm also not strict on what I feel is the one true format in any given situation. We have human minds that are adaptable. We are trained in the syntax and layout of code, and what it means. If a brace starts on this line or the next, I can grok it. However, there is something to be said for consistency within a project. I believe it boosts productivity, and that's why I'm not anti-coding standard(s).

But why does this have to be so? The meaning of the code itself doesn't change depending on where you put the brace (as long as the semantic structure doesn't change). The compiler doesn't care.* Why should we?

I think it all stems from the fact that the format of code is inherently tied to how it is stored -- as unstructured text documents. Can we do better than that?

How many years now has the visual representation of some document been isolated from its storage format? I'm too young to remember the beginnings of TeX and other typesetting languages. Throw in WYSIWYG word processing next. The office word processor that shall remain nameless used a binary storage format for a long time, only recently switching to some XML internal structure (perhaps compressed, but still XML). And definitely uses structured text (xml, compressed on disk), to describe documents. In the case of (most) WYSIWYG word processing, though, you are describing layout instead of structure, but there are good examples where that is not the case (I leave that as an exercise to the reader).

I give you the quintessential example -- HTML. HTML describes documents, usually intended for human viewing, in a structured fashion. How they are *visually* represented is completely up to the renderer! Hell, it doesn't even have to be visual. Site-impaired folk can have their HTML documents read to them.

An HTML page rendered my phone has the same content as one rendered for FireFox, but they look completely different. Even HTML on a given website can be rendered differently, if the designers were so forward thinking as to make the page skinnable via dynamic CSS changes. Again, the content is the same. Only the presentation changes.

Now, I come full circle. Why can't this *very same* (old) idea be applied to code? Let's remove the storage format from the presentation/editing format. I argue that we should be able to store, for example, Java code with all unnecessary whitespace removed. Load it up in your fancy new rendering editor, and your rendering/formatting preferences are applied for the visualization and editing of the file. When you save it, the editor does the opposite -- remove any formatting-only specific text, and save it in "canonical" form.

Syntax highliting is arguably the (minimalist) first step. The colors on your keywords are not stored in the file. The are added by the editor/IDE as part of visualizing the code. Some are simple text matching stupid, but other editors grok the structure and "do the right thing."

Next, there are plenty of "pretty printer" reformatting tools out there. Eclipse does it. And I believe there are other tools for other languages that do it. People use them to enforce coding standards as an "on commit" step into the source code repository. Code checked in is automatically run thru a formatter and is committed in canonical form.

Well, I say screw all that jumping thru hoops. Lets make this the editor's job. If we can already syntax highlight, and thusly grok real code structure, and already auto-format to configurable specifications, then lets take it to the next step. Let the editor do it on every load and save.

The one argument I see against this is a potential for losing nice diff-ability. If we store in 'canonical' format (perhaps compressed to minimal whitespace), and I want to diff revisions, it makes it slightly more difficult. The diff tool would need to understand the rendering process as well, and thus you might liklely have to use some diff feature built into the editor. Otherwise, your diff tool of choice would need the same renderability from the canonical format to your rendered view of choice. Again, I'll use eclipse as my argument -- it provides more of a structural diff view anyway (not just +++ --- text added/removed stuff). Which, since it already understands code structure of new vs old (regardless of view and storage format), it shouldn't have any problems if the stored format is not the same as the viewed format. The idea actually plays BETTER with this kind of diff, because you see actual structural changes and not just text format changes. Line ending changes and JoeBob Teamate's goofy reformatting no longer show up as diff's, and potentially don't even need to be saved because no *content* has actually changed. This is a good thing.

Anyway, I'm curious what the hapless reader of this blog thinks. I tried some cursory googling, but nothing following my ideas comes across in terms of actual programming. There are plenty of pretty printers and web code-sample displayers etc. These all have the same end goal as my idea, but none take it back to the actual editing step. Do you know of such a tool with the features I desire?

If I ever get magical "free time" I might play with some eclipse code to see if my idea would work. The pieces all appear to be there... just gotta knit them together. Yay, open source!

* This argument only works for stream oriented languages. I'll ignore python for the moment, but any language where whitespace/indenting is meaningful doesn't deserve acknowledgment anyway.

MySQL and the Missing Rows

I was doing some multi-threaded, multi-transactional testing for the backend of the MySQL Enterprise Monitor. I came across a weird failure, where it appeared I was able to successfully insert a row, and then (in the same transaction), a select from the same table did not return any rows.

Consider the following transactions:

mysql> create table t1 (id integer primary key auto_increment, name varchar(32) unique) engine=innodb;

a> begin;

b> begin;
b> select * from t1;

a> insert into t1 values(null, 'oldag');

b> insert into t1 values(null, 'oldag) on duplicate key update id=LAST_INSERT_ID(id);
(b blocks)

a> commit;

(b is released)
Query OK, 0 rows affected (0.00 sec)
b> select * from t1;
Empty set (0.00 sec)

what, what, what?!?!

So, the gist is that the insert appears to succeed (notice the ok, no warnings or errors), but it doesn't affect any rows. Surely, this cannot be correct. So, let's perform a thought experiment.

B begins a transaction, and selects from t1. He locks in his transaction read view (repeatable read isolation level) at whatever it was right then. A then makes a modification in another transaction. B then does the insert, but it is *not* actually an insert. A normal insert would have failed with a duplicate key constraint failure. But the lovely "on duplicate key update" feature turns that into an update request on the existing row with the constraint. But wait -- notice the update is designed to essentially be a no-op. The LAST_INSERT_ID trick is used so that hibernate gets the correct id for the now-not-inserted row (call it poor man's lazy create). Additionally, MySQL has the optimization that, if an update really wouldn't change anything, then by god -- don't change anything! Hence, the "0 rows affected."

Where does that leave us? We have a frozen transaction read view. We have an apparently successful insert (data manipulation) that should be visible in the same transaction. Due to internal trickery, no data manipulation actually occurred. Thusly, the 'B' transaction still has the same frozen read view as before the insert, and no rows appear in the select.

So, you can argue with me all day about the technical internals of what the database engine is doing, and why it is correct, and why it would be hard to change. But I'll argue back that the client experience is unexpected and therefore wrong. An insert with no errors (or warnings) is furthermore not visible immediately after a successful return. I don't care that there were zero rows affected. That is accurate. There WERE zero rows affected. So what. Insert. Select. Show me my rows!

So, what actually may make it impossible... Consider if transaction A had modified/inserted a lot more rows. But then B only inserts a subset of those. How do you make only those rows visible (that B insert/updated) but not the ones that B didn't 'touch?' Granted, it's a hard problems, and in pure transactional database land, it might just be impossible. And, "on duplicate key update" is a mysql-ism that throws a kink in the whole works.

Ok, now that I've told you it is impossible, I will tell you how to make it work. ;)

There is a workaround. Remember how I said that the update is a no-op if nothing is actually updated? Well, what if we actually FORCE some kind of update to happen. In this case, I added a 'dummy' insert_count column, starting at zero. I then changed it to "on duplicate key update id=LAST_INSERT_ID(id), insert_count = insert_count + 1". This forces an update to occur. The data manipulation is recognized, and the row becomes visible in the transaction. It occurred to me soon after that, that I should probably be doing hibernate 'versioning' on the objects with optimistic locking anyway, and maybe that would play nicely with this.

Wednesday, July 9, 2008

Oh Noes -- My Mac is infected!

... with a Windows virus, or with Windows?

While running skype on my macbook pro, I got this lovely little instant message...


ATTENTION ! Security Center has detected
malware on your computer !

Affected Software:

Microsoft Windows Vista
Microsoft Windows XP
Microsoft Windows 2000
Microsoft Windows NT Server 4.0
Microsoft Windows Win98
Microsoft Windows Server 2003

Impact of Vulnerability: Remote Code Execution / Virus Infection /
Unexpected shutdowns

Recommendation: Users running vulnerable version should install a repair
utility immediately

Your system IS affected, download the patch from the address below !
Failure to do so may result in severe computer malfunction.

[url removed]
I find that just absolutely hillarious. What a grand scheme -- spam random skype addresses with some windows virus message. For what it is worth, I also get the same kind of random windows specific virus warnings when visiting certain websites.

Good thing I'm a smug mac bastard and don't have to worry about such things (for now).

Saturday, July 5, 2008

Watch out for the "Texas School for the Blind" van

The title may at first appear to be a joke. But as we were leaving a July 4th fireworks show last night, a white van cut us off in the maylay of traffic that clearly everyone was having to deal with while leaving the park.Sure, people cut you off. It happens. No need to get angry.

Then -- I saw it. The van had a giant Texas School for the Blind placard on the side.First some minor chuckles. Then a real 'lol' moment. The irony of being cut off by *this* specific van was too much. And yes, hah hah -- I get that a blind person was not driving. It's still funny. Boo-hiss on you.

Then the real question settled in -- not to be insensitive, but what were they doing at a fireworks display anyway... listening to them go off?

Genius Bar: a.k.a. Mac'tard Central

I succumbed to the infamous burnt power supply issue with my MacBook Pro. After two years of solid use and abuse, it finally gave up the ghost.

I'm still covered under AppleCare pro support, and as this is a known issue -- I figure getting a new adaptor is a no-brainer. A quick call to the AppleCare support line, and the nice gentleman on the other end agrees. We make an appointment for the next day at the local apple store. In the mean time, I head to Fry's to pick up a new supply, as this is my work machine. My battery is also crappy and lasts less than an hour, and I've got plenty more work to do than that.

The next day, I head to my "Genius Bar" appointment around lunch time. Right on time, my name is called. This is the last thing the genius does well.

First, he has no idea why I'm there. Second, he has no idea I'm currently under AppleCare pro coverage. The phone support representative said both of things would be "in the system" and I'd be taken care of right away. There are three possibilities here. Either the genius didn't check the system to see why I was there, or the phone support person did not put the info in the system, OR... there is no system at all.

Second, I have to convince the mac'tard -- oops, genius -- that I'm really covered by apple care. I brought my laptop just in case, but I also had my original apple care coverage certificate. The mac'tard says "duhhh, errrr... I'm not sure it has the info I need to look up your information. Can I just get the serial number from your computer?" Look, mac'tard -- right THERE (I show him) -- that IS the serial number. "duhhh, errrrr --- let me enter it in and see if it works." Needless to say, it was the correct serial number. The astonished "genius" then declares me valid and legal.

Now, as if I was not having enough fun already...

I finally get the requisite "so, what seems to be the problem." Eyes, rolling, I said "'scuse me, while I whip this out" (boy, I wish I really did say that). I show him my obviously charred power adaptor. The genius is perplexed. He turns it over, and around, and flexes the cable, smells it, etc. I'm surprised he didn't stick his tongue on the burnt part to taste it.

After several minutes of deep contemplation, the answer comes: "I believe your adaptor shorted out." WOW. I bow and kneel in the awesome presence of the obviously superior mac-intellect of the certified Genius standing before me. Clearly, I am not worthy and should run from the store screaming giddy joy like a little school girl.

Due to my obviously lesser mac-intellect, I now get the obligatory/deragotory solution that these geniuses must be trained to dole out. "Sir, we will replace your adaptor THIS time... but let me show you how to wrap your cord so it doesn't happen again. See, you need to loop the thin cord before wrapping it around the posts..." I stop him mid-sentence (and yes, this part is true). I pull out my brand-new adaptor (purchased the night before) from my laptop bag. It has the very stress-reducing loop of which he was so graciously attempting to inform me. I actually say "I'm not retarded. I've done it this way for TWO YEARS." [blank stare -- both directions] "uhhhh, ok. let me get that part for you."

Finally convinced of his mac'tardedness, the genius fetches a new power brick from the back. Sign here, date that. And, I'm done.

So, I get that the genius bar probably spends 99% of the time dealing with regular idiots. I mean, we are talking about the apple store here. But I'm quite certain, when I show you a burnt, charred power adaptor -- you should just fetch me a new one.

At the end of the day, I now have two power supplies -- which actually makes life a tad easier for this nomadic worker. My primary work-spot at home now has a permanent adaptor. Setting up and leaving that spot is slightly less of a burden.

Apple support sent me a follow-up email, asking me to rank the "experience." I'm definitely going to reply with the appropriate negative feedback, but boy am I tempted to just link to this post...