Wednesday, June 18, 2008

Hibernate: Cache Queries the Natural Id Way

I work on the MySQL Enterprise Tools team, formerly of MySQL and now with Sun Microsystems. The 2.0 version of the Enterprise Monitor is well under way. As part of this, the Java server backend has been refactored to utilize Spring and Hibernate. Honestly, I didn't know either one of those technologies before starting this project. Oh, what a fun road it has been...

A big draw for using an off-the-shelf ORM was so that we didn't have to write our own (kind of bad and slightly wrong -- those darn transactions) caching implementations for the custom one-off ORM that existed previously. A lot of our internal meta-model is very static, so clearly caching would be a HUGE win for performance, right?

Not so fast, turbo. Let me continue...

The headline feature for the 2.0 Monitor is "Query Analysis." Coupled with the MySQL Proxy, the Monitor receives captured query data to/from a MySQL server. Once at the monitor, the data can be aggregated, analyzed, and reported upon. What better test for this feature than to use it on ourselves, to tune ourselves!

And this brings me back to hibernate caching. In the course of monitoring ourselves, I noticed that a certain query was happening WAY more than it should be, just based on my gut feeling. The query in question loaded an object that was generally static -- save for one value that represents the frequency of how often some data should be collected. Its the only mutable value, and once in place, it rarely changes.

Hrm... how to debug. First, we checked the cache settings. Whoops -- WAY too low for both the cache expiry timeout and the max cache elements. Fix that. Still sucks. Some cursory hibernate source and log perusal showed that the cache for these objects was being invalidated at a rapid rate. Yes, the entire cache. Even though the objects are essentially static, the query cache takes the safe route and says any change to a related table invalidates any and all queries that reference that table. This makes sense, from a generic cache standpoint. But I thought to myself -- surely there has to be a way. *I* am smarter than hibernate in this case, and *I* can more rightly determine when the query results should be invalidated. Lucky for me, hibernate allows you to extend the StandardQueryCache "up to date" policy checks. w00t. I implemented one, overrode the timeout policy for the object(s) in question, and re-ran tests. FAILURE. Turns out I am not smarter than hibernate.

However, in the process of implementing the custom query cache policy, I had debugged through some more hibernate code and noticed that "natural id" queries are treated special. Some more google-fu, and quickly I come across Queries by Natural Identifier in the hibernate docs.

Now, the docs just aren't real clear on what optimizations can be made internally by utilizing the Criteria with a natural id restriction. But, as I was just in that section of code, I could correlate it. Here's the meaty bit -- if you make a natural id / key lookup, and hibernate recognizes it as such, it can bypass the table timestamp invalidation and go directly to second-level cache to fetch the object. Hibernate knows, with an immutable and unique natural key, that a table modification will not effect the composition of the object in question (of course, an object modification would, and it would have been evicted from L2 cache).

I cannot overemphasize the utility of this discovery. You see, we were making frequent inserts into the table. But existing objects (rows) where changed almost never. But without the natural key lookup, the inserts invalidated all results in the query cache. There you have why I was seeing way more selects for the same objects than I had anticipated.

Some quick assurances that we mapped the natural id correctly, some quick refactoring of the HQL into Criteria queries with natural id restrictions, and whammo, we're good. Lets run the tests and query analysis again... ruh roh. OH COME ON! (not my exact reaction, but I think you can guess what it was really like).

Confident that the natural id cache lookup optimization was what I really really really wanted, there had to be something else going on. More debuggage ensued. I set a breakpoint near the same area in StandardQueryCache where I first noticed the query cache optimization in the first place. Lo and behold, the hibernate metadata for saying "i am a natural key lookup" was returning false.

I am not amused. I am confident my hibernate mapping is correct, because the unique index was present in the schema. Think. Think. Think. Well, I had recently been on an effort to move from the hbm xml mappings to hibernate annotations mappings. @NaturalId support was, in fact, the very reason I had recently upgraded the annotations jar. On a hunch, I reverted the persistence mapping back to the xml form. Debug, the metadata returns the correct value... test, and YES, finally -- the queries issued are in line with my expectations and the rows present in the database. I. Have. Won.

Being the good open-source citizen, I made a hibernate forum post that detailed my findings, including simplified sample code demonstrating the problem. The good folks on the hibernate forum (after questioning the sillyness of my contrived example) were quick to recognize the problem, and I got a hibernate jira issue opened.

The workaround, obviously, is leave the xml mapping in place until the fix makes it into a hibernate release. Not too bad of a deal, I guess, considering the overall win I now have in my cache hit ratio.

In conclusion -- if it makes sense for your data model, the natural id query cache optimization can be a huge performance win for your app. If you have immutable, or rarely changed objects with a constant natural key lookup -- look into the Criteria natural id restriction. And, use the xml mapping until the bug is fixed.

ps -- there is one other performance note to consider, actually. If using the natural id query, and it returns no rows, this NULL result will not be cached. So, if you have more of these than 'object/row found' results, you will still get tons of these queries that you don't expect. Either stop using the natural key optimization (if 'not found' is more common), or extend your object/schema to include a 'not supported' field. In our case, the lack of a row meant "not supported" and we had a flag "not supported" in case it was supported, but then went away. In those cases where something was frequently "not supported" I simply went ahead and created the object/row and just set the flag to false -- thus ensuring the natural key optimization was not subverted.


Hardy Ferentschik said...

Bug in Hibernate Annotations is fixed now. Haven't traced the code all the way into the caching part, but the meta data is correct now. I am intrigued to hear if this will solve your problem.

Oldag said...

I pulled the latest annotations src from their trunk, built, and ran with it. I'm pleased to say the @NaturalId mapping is fixed, and the query cache optimization works as expected.

You can read more about it back in the hibernate forum thread linked to above.

thanks again, Hibernate team!

Darin Keever said...

So.. if you write a long enough blog, someone fixes something? Guess I need to start blogging.. capriciously.

Irfan Shaik said...


We are facing a problem with memcache. We are using hibernate-memcached-version1.5-SNAPSHOT and spymemcached-provider-version3.0.2.

The following are the configuration


!-- Enable 2nd Level Cache --

property name="hibernate.cache.use_second_level_cache" value="true"
property name="hibernate.cache.use_query_cache" value="true"

!-- MemCache Configuration --

property name="hibernate.cache.region.factory_class" value="com.googlecode.hibernate.memcached.MemcachedRegionFactory"
property name="hibernate.cache.use_minimal_puts" value="true"
property name="hibernate.cache.use_structured_entries" value="true"
property name="hibernate.memcached.servers" value="${}"
property name="hibernate.memcached.cacheTimeSeconds" value="1800"


@org.hibernate.annotations.Cache(usage = CacheConcurrencyStrategy.READ_WRITE)

In GenericDao we are setting query hints cacheable to true for loadAll().

We are using loadAll() method to fetch all the records.
Whenever we made a request the loadAll query is executing and also the queries based on id are executing.

when i refer to the log am able to notice that the data is fetching from database and setting in memcache for a request and when we make another request instead of fetching the data from the memcache it is hitting again to the db and again setting to memcache. Am unable to understand that without modifying any data why it is hitting to db?

Please let me know we are missing anything.

for IT the said...

Amazing & Great informative blog,it gives very useful practical information to developer like me. Besides that Wisen has established as Best Hibernate Training in Chennai . or learn thru Online Training mode Hibernate Online Training | Java EE Online Training. Nowadays Hibernate ORM has tons of job opportunities on various vertical industry.

Unknown said...

Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging.
Best Devops online Training
Online DevOps Certification Course - Gangboard

Aliya Manasa said...

I have read a few of the articles on your website now, and I really like your style of blogging. I added it to my favourites blog site list and will be checking back soon.
python Course in Pune
python Course institute in Chennai
python Training institute in Bangalore

tamizh said...

Thanks you for sharing this unique useful information content with us. Really awesome work. keep on blogging
Selenium training in Chennai
Selenium training in Bangalore
Selenium training in Pune
Selenium Online training

service care said...

I think this is a great site to post and I have read most of contents and I found it useful for my Career .Thanks for the useful information. Good work.Keep going.
oneplus service center chennai
oneplus service center in chennai
oneplus service centre chennai

Soumitasai said...

Outstanding blog thanks for sharing such wonderful blog with us ,after long time came across such knowlegeble blog. keep sharing such informative blog with us.

Check out : big data training in velachery
big data analytics training and placement
big data training in chennai chennai tamilnadu
big data workshop in chennai

IT Tutorials said...

Really useful information. Thank you so much for sharing.It will help everyone.Keep Post. RPA training in chennai | RPA training in Chennai with placement | UiPath training in Chennai | UiPath certification in Chennai with cost

Amirtha Gowri said...

It’s interesting content and Great work. Definitely, it will be helpful for others. I would like to follow your blog. Keep post

Check out:
best hadoop training in omr
hadoop training in sholinganallur
best institute for big data in chennai
big data hadoop course in chennai with placement

Diya shree said...

Good job and thanks for sharing such a good blog You’re doing a great job. Keep it up !!

PMP Certification Fees | Best PMP Training in Chennai |
pmp certification cost in chennai | PMP Certification Training Institutes in Velachery |
pmp certification courses and books | PMP Certification requirements |
PMP Training Centers in Chennai | PMP Certification Requirements | PMP Interview Questions and Answers

Kayal m said...

Nice post! I love this blog and I got more kinds of techniques in this topic. Thanks for your sharing.

Primavera Training in Chennai
Primavera Course in Chennai
Pega Training in Chennai
Unix Training in Chennai
Power BI Training in Chennai
Excel Training in Chennai
Oracle Training in Chennai
Social Media Marketing Courses in Chennai

vinith said...

Its as if you had a great grasp on the subject matter, but you forgot to include your readers. Perhaps you should think about this from more than one angle.
python training in bangalore

Amirtha Gowri said...

Wow! Really a nice blog. Thank you so much for you effort.

Check out:
reactjs interview questions pdf
reactjs training in chennai
react js tutorial

Amirtha Gowri said...

It’s interesting content and Great work. Definitely, it will be helpful for others. I would like to follow your blog. Keep post

Check out:
Selenium training courses in chennai
Selenium training center in chennai
Selenium training in chennai quora
Selenium course fees in chennai

Unknown said...

cool stuff you have and you keep Python classes in pune overhaul every one of us

Chris Hemsworth said...

The article is so informative. This is more helpful for our
software testing training courses
selenium classes Thanks for sharing

IT Tutorials said...

Get the most advanced Python Course by Professional expert. Just attend a FREE Demo session.
For further details call us @ 9884412301 | 9600112302
Python training in chennai | Python training in velachery

IT Tutorials said...

Get the most advanced AWS Course by Professional expert. Just attend a FREE Demo session.
For further details call us @ 9884412301 | 9600112302
AWS training in chennai | AWS training in velachery

Diya shree said...

I feel very grateful that I read this. It is very helpful and very informative and I really learned a lot from it.

blue prism training in chennai | blue prism training in velachery | blue prism training and placement | best training institute for blue prism | blue prism course fee details | Best Blue Prism Training in Credo Systemz, Chennai | blue prism certification cost | blue prism certification training in chennai | blue prism developer certification cost

Techy said...

Nice Post
For Data Science training in Bangalore, Visit:
Data Science training in Bangalore

Techy said...

For AI training in Bangalore, Visit:
Artificial Intelligence training in Bangalore

Techy said...

For Data Science training in Bangalore, Visit:
Data Science training in Bangalore