A big draw for using an off-the-shelf ORM was so that we didn't have to write our own (kind of bad and slightly wrong -- those darn transactions) caching implementations for the custom one-off ORM that existed previously. A lot of our internal meta-model is very static, so clearly caching would be a HUGE win for performance, right?
Not so fast, turbo. Let me continue...
The headline feature for the 2.0 Monitor is "Query Analysis." Coupled with the MySQL Proxy, the Monitor receives captured query data to/from a MySQL server. Once at the monitor, the data can be aggregated, analyzed, and reported upon. What better test for this feature than to use it on ourselves, to tune ourselves!
And this brings me back to hibernate caching. In the course of monitoring ourselves, I noticed that a certain query was happening WAY more than it should be, just based on my gut feeling. The query in question loaded an object that was generally static -- save for one value that represents the frequency of how often some data should be collected. Its the only mutable value, and once in place, it rarely changes.
Hrm... how to debug. First, we checked the cache settings. Whoops -- WAY too low for both the cache expiry timeout and the max cache elements. Fix that. Still sucks. Some cursory hibernate source and log perusal showed that the cache for these objects was being invalidated at a rapid rate. Yes, the entire cache. Even though the objects are essentially static, the query cache takes the safe route and says any change to a related table invalidates any and all queries that reference that table. This makes sense, from a generic cache standpoint. But I thought to myself -- surely there has to be a way. *I* am smarter than hibernate in this case, and *I* can more rightly determine when the query results should be invalidated. Lucky for me, hibernate allows you to extend the StandardQueryCache "up to date" policy checks. w00t. I implemented one, overrode the timeout policy for the object(s) in question, and re-ran tests. FAILURE. Turns out I am not smarter than hibernate.
However, in the process of implementing the custom query cache policy, I had debugged through some more hibernate code and noticed that "natural id" queries are treated special. Some more google-fu, and quickly I come across Queries by Natural Identifier in the hibernate docs.
Now, the docs just aren't real clear on what optimizations can be made internally by utilizing the Criteria with a natural id restriction. But, as I was just in that section of code, I could correlate it. Here's the meaty bit -- if you make a natural id / key lookup, and hibernate recognizes it as such, it can bypass the table timestamp invalidation and go directly to second-level cache to fetch the object. Hibernate knows, with an immutable and unique natural key, that a table modification will not effect the composition of the object in question (of course, an object modification would, and it would have been evicted from L2 cache).
I cannot overemphasize the utility of this discovery. You see, we were making frequent inserts into the table. But existing objects (rows) where changed almost never. But without the natural key lookup, the inserts invalidated all results in the query cache. There you have why I was seeing way more selects for the same objects than I had anticipated.
Some quick assurances that we mapped the natural id correctly, some quick refactoring of the HQL into Criteria queries with natural id restrictions, and whammo, we're good. Lets run the tests and query analysis again... ruh roh. OH COME ON! (not my exact reaction, but I think you can guess what it was really like).
Confident that the natural id cache lookup optimization was what I really really really wanted, there had to be something else going on. More debuggage ensued. I set a breakpoint near the same area in StandardQueryCache where I first noticed the query cache optimization in the first place. Lo and behold, the hibernate metadata for saying "i am a natural key lookup" was returning false.
I am not amused. I am confident my hibernate mapping is correct, because the unique index was present in the schema. Think. Think. Think. Well, I had recently been on an effort to move from the hbm xml mappings to hibernate annotations mappings. @NaturalId support was, in fact, the very reason I had recently upgraded the annotations jar. On a hunch, I reverted the persistence mapping back to the xml form
Being the good open-source citizen, I made a hibernate forum post that detailed my findings, including simplified sample code demonstrating the problem. The good folks on the hibernate forum (after questioning the sillyness of my contrived example) were quick to recognize the problem, and I got a hibernate jira issue opened.
The workaround, obviously, is leave the xml mapping in place until the fix makes it into a hibernate release. Not too bad of a deal, I guess, considering the overall win I now have in my cache hit ratio.
In conclusion -- if it makes sense for your data model, the natural id query cache optimization can be a huge performance win for your app. If you have immutable, or rarely changed objects with a constant natural key lookup -- look into the Criteria natural id restriction. And, use the xml mapping until the bug is fixed.
ps -- there is one other performance note to consider, actually. If using the natural id query, and it returns no rows, this NULL result will not be cached. So, if you have more of these than 'object/row found' results, you will still get tons of these queries that you don't expect. Either stop using the natural key optimization (if 'not found' is more common), or extend your object/schema to include a 'not supported' field. In our case, the lack of a row meant "not supported" and we had a flag "not supported" in case it was supported, but then went away. In those cases where something was frequently "not supported" I simply went ahead and created the object/row and just set the flag to false -- thus ensuring the natural key optimization was not subverted.