Wednesday, June 18, 2008

Hibernate: Cache Queries the Natural Id Way

I work on the MySQL Enterprise Tools team, formerly of MySQL and now with Sun Microsystems. The 2.0 version of the Enterprise Monitor is well under way. As part of this, the Java server backend has been refactored to utilize Spring and Hibernate. Honestly, I didn't know either one of those technologies before starting this project. Oh, what a fun road it has been...

A big draw for using an off-the-shelf ORM was so that we didn't have to write our own (kind of bad and slightly wrong -- those darn transactions) caching implementations for the custom one-off ORM that existed previously. A lot of our internal meta-model is very static, so clearly caching would be a HUGE win for performance, right?

Not so fast, turbo. Let me continue...

The headline feature for the 2.0 Monitor is "Query Analysis." Coupled with the MySQL Proxy, the Monitor receives captured query data to/from a MySQL server. Once at the monitor, the data can be aggregated, analyzed, and reported upon. What better test for this feature than to use it on ourselves, to tune ourselves!

And this brings me back to hibernate caching. In the course of monitoring ourselves, I noticed that a certain query was happening WAY more than it should be, just based on my gut feeling. The query in question loaded an object that was generally static -- save for one value that represents the frequency of how often some data should be collected. Its the only mutable value, and once in place, it rarely changes.

Hrm... how to debug. First, we checked the cache settings. Whoops -- WAY too low for both the cache expiry timeout and the max cache elements. Fix that. Still sucks. Some cursory hibernate source and log perusal showed that the cache for these objects was being invalidated at a rapid rate. Yes, the entire cache. Even though the objects are essentially static, the query cache takes the safe route and says any change to a related table invalidates any and all queries that reference that table. This makes sense, from a generic cache standpoint. But I thought to myself -- surely there has to be a way. *I* am smarter than hibernate in this case, and *I* can more rightly determine when the query results should be invalidated. Lucky for me, hibernate allows you to extend the StandardQueryCache "up to date" policy checks. w00t. I implemented one, overrode the timeout policy for the object(s) in question, and re-ran tests. FAILURE. Turns out I am not smarter than hibernate.

However, in the process of implementing the custom query cache policy, I had debugged through some more hibernate code and noticed that "natural id" queries are treated special. Some more google-fu, and quickly I come across Queries by Natural Identifier in the hibernate docs.

Now, the docs just aren't real clear on what optimizations can be made internally by utilizing the Criteria with a natural id restriction. But, as I was just in that section of code, I could correlate it. Here's the meaty bit -- if you make a natural id / key lookup, and hibernate recognizes it as such, it can bypass the table timestamp invalidation and go directly to second-level cache to fetch the object. Hibernate knows, with an immutable and unique natural key, that a table modification will not effect the composition of the object in question (of course, an object modification would, and it would have been evicted from L2 cache).

I cannot overemphasize the utility of this discovery. You see, we were making frequent inserts into the table. But existing objects (rows) where changed almost never. But without the natural key lookup, the inserts invalidated all results in the query cache. There you have why I was seeing way more selects for the same objects than I had anticipated.

Some quick assurances that we mapped the natural id correctly, some quick refactoring of the HQL into Criteria queries with natural id restrictions, and whammo, we're good. Lets run the tests and query analysis again... ruh roh. OH COME ON! (not my exact reaction, but I think you can guess what it was really like).

Confident that the natural id cache lookup optimization was what I really really really wanted, there had to be something else going on. More debuggage ensued. I set a breakpoint near the same area in StandardQueryCache where I first noticed the query cache optimization in the first place. Lo and behold, the hibernate metadata for saying "i am a natural key lookup" was returning false.

I am not amused. I am confident my hibernate mapping is correct, because the unique index was present in the schema. Think. Think. Think. Well, I had recently been on an effort to move from the hbm xml mappings to hibernate annotations mappings. @NaturalId support was, in fact, the very reason I had recently upgraded the annotations jar. On a hunch, I reverted the persistence mapping back to the xml form. Debug, the metadata returns the correct value... test, and YES, finally -- the queries issued are in line with my expectations and the rows present in the database. I. Have. Won.

Being the good open-source citizen, I made a hibernate forum post that detailed my findings, including simplified sample code demonstrating the problem. The good folks on the hibernate forum (after questioning the sillyness of my contrived example) were quick to recognize the problem, and I got a hibernate jira issue opened.

The workaround, obviously, is leave the xml mapping in place until the fix makes it into a hibernate release. Not too bad of a deal, I guess, considering the overall win I now have in my cache hit ratio.

In conclusion -- if it makes sense for your data model, the natural id query cache optimization can be a huge performance win for your app. If you have immutable, or rarely changed objects with a constant natural key lookup -- look into the Criteria natural id restriction. And, use the xml mapping until the bug is fixed.

ps -- there is one other performance note to consider, actually. If using the natural id query, and it returns no rows, this NULL result will not be cached. So, if you have more of these than 'object/row found' results, you will still get tons of these queries that you don't expect. Either stop using the natural key optimization (if 'not found' is more common), or extend your object/schema to include a 'not supported' field. In our case, the lack of a row meant "not supported" and we had a flag "not supported" in case it was supported, but then went away. In those cases where something was frequently "not supported" I simply went ahead and created the object/row and just set the flag to false -- thus ensuring the natural key optimization was not subverted.


Free Light Show

A few weeks ago some really bad storms came through central Texas. Since I watch TV via DVR, I had no idea there was a storm coming, much less the severity of them. I didn't find out until I came back to my computer, and several friends had IMd me "it's headed right for you!" and "are you in your basement" (to which, in Texas -- no one has basements).

I flipped on the news, and the weatherman was speaking of some obvious rotation "two miles north/northeast of pflugerville." Yeah, uhh... that's right about where I live. In fact, the weatherman's map showed it probably went about a mile north of my house.

But, strangely enough -- no damage. The rest of Austin had plenty, but we survived without a scratch. Perhaps, because we have no trees at all in our neighborhood...

I captured some lo-res video of the lightning. Give it a bit at the beginning, it's not easy to capture lightning video, you know.


Tuesday, June 17, 2008

Dueling Shared Libraries

I'm famous! Well, in my own mind. I've made it onto Jan Kneschke's blog, as we tackled an ugly problem on Mac OS X when trying to use a self-built MySQL Monitor Agent.

I won't plagerize, so go read it yourself.

JUnit Eats Your Memory

Once again, we started getting OutOfMemoryError's in our JUnit test suite. Some necessary refactoring re-introduced me to an old foe that I had battled once before. I figured, what the hell -- an excellent opportunity for my first blog post.

There's an important FAQ in JUnit that should be bold, blinking, italics, and highlighted. The gist is -- if you assign an instance member in a JUnit TestCase (declaration, setUp(), or otherwise), you damn well better assign it to null in tearDown().

You see, JUnit creates an instance of the TestCase class for EVERY test method. It would have you believe, the way it is architected, that the TestCase class in question is instantiated once, every test method is called with setUp() and tearDown() pre- and post-, and all is glorious. But that is not the case. Every test method gets its own test case instance. Oh -- and the important detail -- ALL the classes have references held on them until the END of the test suite run.

So, if you do any sort of non-trivial testing with even moderately large (but transient) objects, be sure to set the refs in your testcase to null in tearDown() so they can be garbage collected.

Better yet -- lets find a newer tool than JUnit. Just because it is ubiquitous doesn't mean it is the best tool.