Monday, November 10, 2008

Hibernate Query Cache: A Dirty Little Secret

You Mean, Memory Is Not Infinite?

We're working hard getting MySQL Enterprise Monitor 2.0, featuring Query Analyzer, ready for release. As part of that, we started really ramping up the number of MySQL servers reporting in query data to see how we could scale. Not surprising (to me, anyway), the first efforts did not go so well. My old friend OutOfMemoryError reared its ugly head once again.

Query Cache -- It's More Than Just Results!

We're big (ab)users of hibernate query caching, and more importantly to us the natural id optimized query cache. Firing up the profiler, I was not shocked to see that the second level (let's call it L2) and query caches were hogging the majority of memory. But, something didn't smell right...

What I was seeing was tons of object/entity referenes for our persistent objects. However, the hibernate cache does not really work that way. Hibernate 'dehydrates' query results and persistent objects into their primitive components and identifiers. It stores this decomposed data in the L2 and query results cache, and on a cache hit, it rehydrates/recomposes them into the requested persistent object.

But the profiler does not lie. One of our objects, for which there are conceptually 60 some odd instances, had over 20,000 referenced instances on the heap. Yikes. Obviously, we're doing something wrong, or are we... ?

Ok, Mr. Profiler, who is holding the references on the heap? Drill down a bit through the query cache, past some HashMap entries, into the... keys... keys, you say? Hrm, not values. Interesting. Well, looky here. Hibernate's QueryKey is holding on to a Object[] values array, which is holding the majority of our persistent object references. Ouch. In addition to that, it has a map inside of it whose values also contain non-trivial amount of references to our entities.

Well, nuts. Code spelunking ensues. QueryKey is just as it sounds -- an object that can act as a unique key for query results. This means it includes stuff like the SQL query text itself as well as any parameters (positional or named) that specifically identify a unique result set.

Objects, Objects, Everywhere

Now, silly me, since we are using an object relational mapping, I was using objects for the parameters in my HQL. Something along the lines of:
final Cat mate = ...;
final String hql = "from Cat as cat where cat.mate = ?"
final Query q = session.createQuery(hql);
q.setParameter(0, mate);
q.setCacheable(true);
In this case, the entire Cat mate (and everything he references) would be held in perpetuity. Well, until either the query cache exceeds his configured limits and it is evicted, or the table is modified and the results become dirty.

Let's not forget our friends the Criteria queries, either. Because it is only through criteria that we can get our friend the natural id cache optimization. (and please pardon the contrived-ness of the cat natural id example)
final String foo = "something";
final Cat mate = ...;

final Criteria crit;
final NaturalIdentifier natId;

crit = session.createCriteria(Cat.class);
natId = Restrictions.naturalId();
natId.set("mate", mate);
natId.set("foo", foo);
crit.add(natId);
crit.setCacheable(true);
In the same fashion as the HQL, this will result in hard references to 'mate' and 'foo' held for the cache-life of the query results.

How To Make a Bad Problem Worse

Even worse, in our case, was the fact that we would do the equivalent of load the same 'mate' over and over again (maybe this cat is severely non-monogamous). And whether loaded from L2 cache or directly from the database, the mate Cat now existed as multiple instances, even though they are hashCode()/equals() equivalent. But QueryKey in the query cache doesn't know that. He only knows what he is handed. And he is handed equivalent duplicates over and over and over again, and only lets go of them on cache eviction. So, not only do we end up with essentially unnecessary references to objects held onto by the query keys in the cache, we instantiate and hold onto multiple multiple instantiations of the same object and hold on to those, too. Bear with me as I bow my head in shame...

Fix Attempt 1: Make a Smarter Cache

I've been down this road before. I tried to be smarter than Hibernate once before. It did not end well. Unsullied by prior defeat, I resolved to attempt being smarter than Hibernate once again!

Hibernate's query cache implementation is pluggable. So I'm going to write my own. Ok, I'm not going to write my own -- from scratch. My going theory is that I can at least eliminate the duplication of the equivalent objects referenced in memory. I'm going to decorate hibernate's StandardQueryCache and do the following: For each QueryKey coming in a cache put(), introspect the Object[] values (which are positional parameters to the query). For each object in values[], see if an equivalent canonical object has already been seen (same hashCode/equals()). If so, use the canonical object. Else, initialize the canonical store with this newly seen object.

Notice we only have to do this on put(). A get() can use whatever objects already come in, as they are assumed to be hashCode/equals equivalent. Hell, it HAS to work that way, otherwise QueryKey would just be broken from the start. Here is some snippets of relevant code that implement org.hibernate.cache.QueryCache.
public boolean put(QueryKey key, Type[] returnTypes,
@SuppressWarnings("unchecked") List result, boolean isNaturalKeyLookup,
SessionImplementor session) throws HibernateException {

// duplicate natural key shortcut for space and time efficiency
if (isNaturalKeyLookup && result.isEmpty()) {
return false;
}

canonicalizeValuesInKey(key);

return queryCache.put(key, returnTypes, result, isNaturalKeyLookup,
session);
}

private void canonicalizeValuesInKey(QueryKey key) {
try {
final Field valuesField;
valuesField = key.getClass().getDeclaredField("values");
valuesField.setAccessible(true);
final Object[] values = (Object[]) valuesField.get(key);
canonicalizeValues(values);
} catch (Exception e) {
throw new RuntimeException(e);
}
}

private void canonicalizeValues(Object[] values) {
synchronized (canonicalObjects) {
for (int i = 0; i < values.length; i++) {
Object object = values[i];
Object co = canonicalObjects.get(object);
if (co == null) {
co = object;
canonicalObjects.put(object, co);
} else if (co != object) {
// System.out.println("using pre-existing canonical object "
// + co);
}
values[i] = co;
}
}
}
It's pretty much what i described. I didn't even attempt to get permission to post the whole thing, because it is probably not worth my time. The only thing missing is a HashMap of the canonical objects, an the instantiation of the StandardQueryCache queryCache. You'll also need to implement org.hibernate.cache.QueryCacheFactory to create this smarter query cache factory, and then plug that into your hibernate config.

This did work as expected. My 'outstanding' objects on the heap were greatly reduced. Unfortunately, it was not good enough. I still had thousands of these guys on the heap, essentially unused except to at some point fetch their numeric id to be used by hibernate's generated SQL. And this didn't take care of the named parameters, which are stored in a map of string names to some other internal hibernate class, which I no longer felt like introspecting via reflection. So anything using named parameters was still potentially duplicated.

Fix Attempt 2: Objects? Who Needs Objects?

Hrm, the last paragraph stirred a thought -- hibernate only needs the id's from these objects. While the academic in me enjoyed the exercise in decorating the query cache to be smarter about duplicate references, the idiot in me said "well, duh. if you only need the id, why not just use the id?" Because we're supposed to use objects! Oh well. It occurred to me that I could rewrite any and all HQL to reference object id's instead of the object property reference itself. It should end up in the same SQL eventually sent to the database. Seems like a cheesy code monkey work around, but the theory is that hibernate QueryKey will only be holding onto Integer and Long references instead of entire object entities.

So, I hunt down all of our our cacheable queries. I change them all to object id queries. The previous Cat business can now look like this:
final Cat mate = <...>;
final String hql = "from Cat as cat where cat.mate.id = ?"
final Query q = session.createQuery(hql);
q.setParameter(0, mate.getId());
q.setCacheable(true);
cat.mate becomes cat.mate.id. The mapped parameter becomes mate.getId(). It could be a named parameter just as well. I didn't find a single HQL in our application that I could not convert this way. Good.

But what about our friend Criteria? He requires objects for the properties, right? At first, I thought this was true. And I was consoled by the fact that my smart query cache would do its best to keep duplication at a minimum. In fact, it was the next day (after resting my brain), that another "I'm an idiot" moment came to light. The Criteria API still just takes strings for property references. Perhaps, it follows the same property resolution rules as HQL? In other words instead of "mate", can I say "mate.id"? And the answer is, YES, yes I can! Woo hoo! Absolutely no more object references for me! Here is what the criteria would like like:
final String foo = "something";
final Cat mate = ...;

final Criteria crit;
final NaturalIdentifier natId;

crit = session.createCriteria(Cat.class);
natId = Restrictions.naturalId();
natId.set("mate.id", mate.getId());
natId.set("foo", foo);
crit.add(natId);
crit.setCacheable(true);
Subtle, yes. But trust me, it makes a huge difference. Interesting now, that Fix Attempt 2 likely alleviates the need for Fix Attempt 1. At the worst, we end up with lots and lots of Integer/Long object references and even duplicates of them. The profiler says it is not very much in my limited testing. But, I decide to leave the smarter cache in, because it appears to be working, and it does reduce memory.

Lessons Learned

If you use hibernate query caching, and actually want to use memory for caching useful results, and waste as little as possible with overhead, follow some simple advice:
  • Write your HQL queries to use identifiers in any substitutable parameters.WHERE clauses, IN lists, etc. Using full objects results in the objects being kept on the heap for the life of the cache entry.
  • Write your Criteria restrictions to use identifiers as well.
  • Use the 'smart' query cache implementation skeleton in this article to eliminate duplicate objects used in query keys. This helps a little if you use the ids in your HQL/Criteria, but if you still must use objects then it helps a lot.
  • If you use a smart query cache implementation, prefer '?' positional parameters over named parameters. I know, I know. I love named parameters too. But the smart query cache implementation in this article only detects duplicates for positional parameters (or params in criteria). Alternatively, feel free to extend the example to locate duplicates in the named parameter map and replace them with canonical ones as well.

I'm Not the Only One Who Noticed...

In doing some final research for this post, I came across hibernate core issue HHH-3383. Let's keep our eye on it to see if the wonderful hibernate devs can fix this down at their level, so we don't have to change our code. Also, the issue lists that Criteria cannot be fixed with the same 'use id' workaround. Since I was able to, I wonder if the bug submitter did not realize you can reference dotted property paths in criteria restrictions exactly as you can in HQL. Perhaps I shall start a dialog with him.

Friday, August 29, 2008

MySQL Query Analyzer: A Peek Under the Hood

I had a hard time coming up with a title for this blog post. I really wanted to capture its essence. Several discarded titles include, but are not limited to:
  • How It Works
  • The Gory Details
  • Everything You Never Wanted to Know
  • The Insider's Perspective
I picked the lamest one, hoping that it is catchy enough to ensnare the unfortunate soul that comes across my blog. Just in case you don't know by now, I work for MySQL (ok, Sun) on the Enterprise Tools team. We're approaching (some sort of) release of MySQL Enterprise Monitor 2.0, with the headline feature Query Analysis.

What I want to achieve with this article is to give definitions, data, flow, etc., that describe how we look at things internally in the architecture of the application. This is intended to give the hapless reader an idea of what Query Analysis can do, and consequently what it can not.

What is a query?


I actually call them statements inside the application. To me, query seemed to imply asking about something. And we track more than that. Is an INSERT or UPDATE a query? Is a CREATE TABLE a query? Not in my feeble mind. So, I chose statement because it is generic enough to encompass them all, yet seems still to have ties to database vernacular. Too bad "statement analysis" doesn't quite have that sexy marketing appeal.

A unique statement/query is an application's query that is normalized and executes against a given database. By normalized, I mean all literals are replaced with '?', (some) comments are stripped, the case of SQL itself is made capitalized, and extraneous whitespace is collapsed.

There is a subtle phrase in there that is very important -- "a unique statement ... executes against a given database." Basically, we identify a statement key as {database, normalized text}. That means if you have statement SELECT 1 and execute it in the context of database 'test', switch databases to webapp_foo, and execute it again, in the mind of MySQL Query Analyzer it is a different statement.

This works well for things like replication scale out, where a slave farm will have bunches of queries executing against the same database(s). It breaks down if, somehow, your application uses multiple databases, but yet you really want the queries to be treated the same for aggregation. We haven't found this as the typical use case (yet), but if we get feedback we can revisit this in future releases.

For a statement that executes against a MySQL server instance, I call that a 'statement summary.' Because it is at this level that we actually track and summarize execution statistics. More on this later...

Where does the data come from?

Enter the MySQL Proxy, with some enterprise-only plugins and lua scripts. The proxy sits between your application and your MySQL database server. All queries passing through it are captured, normalized, and stored until they are reported back to the Monitor server. The infamous agent is now a plugin that shares a process inside the chassis, along side the proxy. Proxy and agent communicate, in this case via lua, in order to push query data to the agent plugin so the agent can send it up to the monitor in the same fashion as any other data it collects.

There are potentially other sources of data... but that's a story for a different day.

What data is tracked?


For a given statement, we track:
  • number of executions
  • execution time
  • min/max execution time(s)
  • rows returned/affected
  • min/max row(s)
  • result set size
  • min/max result set size
The normalized statement is tracked by the proxy. Each statement that fits in that normalized bucket is added to the ongoing statistics. The Monitor backend periodically asks for a snapshot (currently every minute), at which point the proxy resets his statistical aggregation (think, "read-clear" semantics).

Also, some extra metadata is kept with the query, for housekeeping reasons:
  • database (which can actually be null - not every query executes in a database context)
  • normalized query text
  • a hash of the text
  • query type (select, insert, update, etc.)
What does the agent/proxy actually send to the Monitor?

Actually? You don't want to know. When you get your hands on it, you can turn up the agent log level to debug and see for yourself. It's not like it's a secret. It's just not worth mentioning here.

Conceptually? Now there's a better topic... Although we don't take full advantage of it in all agent communications, we've moved to a more RESTful approach to agent interaction. Internally, we take the legacy xml agent protocol and turn that into sexy 'representations' that we put to internal REST APIs. An example JSON representation for the query data looks something like this:
PUT /instance/mysql/statementsummary/f8ccc1d2-f63b-4c30-9d42-1d383e626510.em2.02a6ed17a7627d1777f33a6062826951
{
"name": "f8ccc1d2-f63b-4c30-9d42-1d383e626510.em2.02a6ed17a7627d1777f33a6062826951",
"values": {
"bytes": 325,
"count": 8,
"database": "em2",
"exec_time": 3174,
"max_bytes": 45,
"max_exec_time": 613,
"max_rows": 1,
"min_bytes": 38,
"min_exec_time": 394,
"min_rows": 1,
"query_type": "SELECT",
"rows": 8,
"text": "SELECT hibtag0_ . tag_id AS tag1_18_ , hibtag0_ . tag AS tag18_ FROM tags hibtag0_ WHERE hibtag0_ . tag = ?",
"text_hash": "02a6ed17a7627d1777f33a6062826951"
}
}
This is our conceptual representation of a statement summary (remember, a statement summary is a summarized statement execution on a single mysql instance). Statements themselves actually do not exist on the proxy/agent side or in the communications, since they are always tracking a mysql instance. The statement itself is purely a Monitor-server side concept that we track for rollup purposes. The UUID you see in the RESTy URI is the mysql.inventory unique identifier, something we have to introduce for 2.0 to identify, without any uncertainty, a specific mysql instance we are monitoring. Then, it has the database name and the text hash. The uuid.db.text_hash defines the key for this statement summary.

What does the Monitor do with the statement summaries?


From the statement summary, we pull out the pieces which define a statement, which happen to be immutable. They are the database, query text, text hash, and query type. After storing that, we create a statement summary, which basically just ties the statement to a server instance. We also store the creation time of this object/row, so we can track the first time this query was ever seen on a server (or, ever seen period when you take the min and group by the statement). Finally, we take the ongoing snapshot data and store that.

Tell me how the aggregation works.

Aggregation works on two levels. Because we split out immutable statement data, and statement summary (per server) data, we can then aggregate on each of them.

Lets revisit what these snapshots mean. Each statement summary data instance is a snapshot of the summarized statistics for the normalized query over the time period since the last time it was reported. This gives us the powerful ability to then aggregate these snapshots over an arbitrary time range chosen by the end user.

Consider the single server aggregation case. If count is the number of executions in a given minute, and we store this count snapshot over many minutes, then the total count for an arbitrary time range is merely the sum of the counts (to the minimum granularity of our collection frequency, which is currently one minute). Rows and result set size aggregation work exactly the same way.

The mins and maxes aren't magical either. The mins/maxes per snapshot are for that one minute range. If you want the mins/maxes over the larger requested range, just take the min of mins and maxes of the maxes.

There is one "trick" on the monitor-side to show averages over the arbitrary range. We show average execution time, average rows returned/affected, and average result set size. What does an 'average' mean in this case? We want an average per query execution (not per unit time). Average execution time would be the total execution time divided by the total executions. Well, we aggregate execution time, and we aggregate count... so we just take those columns in the query we build, and write an expression column of time/count. There's your average. Same for rows and bytes -- just take the aggregated totals and divide by the count.

Now, what about summaries across multiple servers? Since we have the immutable statement data broken out, its just another aggregation on top of the server one, grouped by the statement identity. And the same sum, min, max, and average tricks still apply.

Can I see statistics per user?


Not at this time. The user stats might be pretty easy to tack on in the next release. It would just become another element to the {database, query} key. Then we could still selectively aggregate at the user level, the server level, or the statement level, or any combination.

Sounds neat! When can I see it?

Currently, query analysis with Monitor 2.0 is in limited beta to select enterprise customers. Email beta-enterprise@mysql.com for more information. There is also a special MySQL Query Analysis forum. It has some good links, and as people get their hands on it, I bet it becomes the central place to go for all things Query Analysis.

What does Query Analysis in MySQL Enterprise Monitor 2.0 look like?

Check out our product manager's blog. He's better at doing touchy feely marketing flap like screen shots. ;)

Conclusion


Here, I covered all the gory details from what is the data we collect, where we collect it from, and how we conceptually view and aggregate it on our backend. I'd like to touch on how the UI interacts with the data -- specifically group views, server views, sorting, searching, and aliasing, but I think this article has hit capacity. Perhaps a blog for another day...

Friday, July 25, 2008

Code Formatting Manifesto

First, let me say that there is only "one true format" for any given programming language. And, of course, that is my format. Then again, the one true format for you -- is your preferred format.

How do we solve this problem today? For any personal projects, one uses his/her one true format, as they are the masters of their own destiny. For shared projects, job related or otherwise, we often rely on code formatting standards.

Now, I'm not anti-coding standard by any means. I'm also not strict on what I feel is the one true format in any given situation. We have human minds that are adaptable. We are trained in the syntax and layout of code, and what it means. If a brace starts on this line or the next, I can grok it. However, there is something to be said for consistency within a project. I believe it boosts productivity, and that's why I'm not anti-coding standard(s).

But why does this have to be so? The meaning of the code itself doesn't change depending on where you put the brace (as long as the semantic structure doesn't change). The compiler doesn't care.* Why should we?

I think it all stems from the fact that the format of code is inherently tied to how it is stored -- as unstructured text documents. Can we do better than that?

How many years now has the visual representation of some document been isolated from its storage format? I'm too young to remember the beginnings of TeX and other typesetting languages. Throw in WYSIWYG word processing next. The office word processor that shall remain nameless used a binary storage format for a long time, only recently switching to some XML internal structure (perhaps compressed, but still XML). And OpenOffice.org definitely uses structured text (xml, compressed on disk), to describe documents. In the case of (most) WYSIWYG word processing, though, you are describing layout instead of structure, but there are good examples where that is not the case (I leave that as an exercise to the reader).

I give you the quintessential example -- HTML. HTML describes documents, usually intended for human viewing, in a structured fashion. How they are *visually* represented is completely up to the renderer! Hell, it doesn't even have to be visual. Site-impaired folk can have their HTML documents read to them.

An HTML page rendered my phone has the same content as one rendered for FireFox, but they look completely different. Even HTML on a given website can be rendered differently, if the designers were so forward thinking as to make the page skinnable via dynamic CSS changes. Again, the content is the same. Only the presentation changes.

Now, I come full circle. Why can't this *very same* (old) idea be applied to code? Let's remove the storage format from the presentation/editing format. I argue that we should be able to store, for example, Java code with all unnecessary whitespace removed. Load it up in your fancy new rendering editor, and your rendering/formatting preferences are applied for the visualization and editing of the file. When you save it, the editor does the opposite -- remove any formatting-only specific text, and save it in "canonical" form.

Syntax highliting is arguably the (minimalist) first step. The colors on your keywords are not stored in the file. The are added by the editor/IDE as part of visualizing the code. Some are simple text matching stupid, but other editors grok the structure and "do the right thing."

Next, there are plenty of "pretty printer" reformatting tools out there. Eclipse does it. And I believe there are other tools for other languages that do it. People use them to enforce coding standards as an "on commit" step into the source code repository. Code checked in is automatically run thru a formatter and is committed in canonical form.

Well, I say screw all that jumping thru hoops. Lets make this the editor's job. If we can already syntax highlight, and thusly grok real code structure, and already auto-format to configurable specifications, then lets take it to the next step. Let the editor do it on every load and save.

The one argument I see against this is a potential for losing nice diff-ability. If we store in 'canonical' format (perhaps compressed to minimal whitespace), and I want to diff revisions, it makes it slightly more difficult. The diff tool would need to understand the rendering process as well, and thus you might liklely have to use some diff feature built into the editor. Otherwise, your diff tool of choice would need the same renderability from the canonical format to your rendered view of choice. Again, I'll use eclipse as my argument -- it provides more of a structural diff view anyway (not just +++ --- text added/removed stuff). Which, since it already understands code structure of new vs old (regardless of view and storage format), it shouldn't have any problems if the stored format is not the same as the viewed format. The idea actually plays BETTER with this kind of diff, because you see actual structural changes and not just text format changes. Line ending changes and JoeBob Teamate's goofy reformatting no longer show up as diff's, and potentially don't even need to be saved because no *content* has actually changed. This is a good thing.

Anyway, I'm curious what the hapless reader of this blog thinks. I tried some cursory googling, but nothing following my ideas comes across in terms of actual programming. There are plenty of pretty printers and web code-sample displayers etc. These all have the same end goal as my idea, but none take it back to the actual editing step. Do you know of such a tool with the features I desire?

If I ever get magical "free time" I might play with some eclipse code to see if my idea would work. The pieces all appear to be there... just gotta knit them together. Yay, open source!


* This argument only works for stream oriented languages. I'll ignore python for the moment, but any language where whitespace/indenting is meaningful doesn't deserve acknowledgment anyway.

MySQL and the Missing Rows

I was doing some multi-threaded, multi-transactional testing for the backend of the MySQL Enterprise Monitor. I came across a weird failure, where it appeared I was able to successfully insert a row, and then (in the same transaction), a select from the same table did not return any rows.

Consider the following transactions:


mysql> create table t1 (id integer primary key auto_increment, name varchar(32) unique) engine=innodb;

a> begin;

b> begin;
b> select * from t1;

a> insert into t1 values(null, 'oldag');

b> insert into t1 values(null, 'oldag) on duplicate key update id=LAST_INSERT_ID(id);
(b blocks)

a> commit;

(b is released)
...
Query OK, 0 rows affected (0.00 sec)
b> select * from t1;
Empty set (0.00 sec)


what, what, what?!?!

So, the gist is that the insert appears to succeed (notice the ok, no warnings or errors), but it doesn't affect any rows. Surely, this cannot be correct. So, let's perform a thought experiment.

B begins a transaction, and selects from t1. He locks in his transaction read view (repeatable read isolation level) at whatever it was right then. A then makes a modification in another transaction. B then does the insert, but it is *not* actually an insert. A normal insert would have failed with a duplicate key constraint failure. But the lovely "on duplicate key update" feature turns that into an update request on the existing row with the constraint. But wait -- notice the update is designed to essentially be a no-op. The LAST_INSERT_ID trick is used so that hibernate gets the correct id for the now-not-inserted row (call it poor man's lazy create). Additionally, MySQL has the optimization that, if an update really wouldn't change anything, then by god -- don't change anything! Hence, the "0 rows affected."

Where does that leave us? We have a frozen transaction read view. We have an apparently successful insert (data manipulation) that should be visible in the same transaction. Due to internal trickery, no data manipulation actually occurred. Thusly, the 'B' transaction still has the same frozen read view as before the insert, and no rows appear in the select.

So, you can argue with me all day about the technical internals of what the database engine is doing, and why it is correct, and why it would be hard to change. But I'll argue back that the client experience is unexpected and therefore wrong. An insert with no errors (or warnings) is furthermore not visible immediately after a successful return. I don't care that there were zero rows affected. That is accurate. There WERE zero rows affected. So what. Insert. Select. Show me my rows!

So, what actually may make it impossible... Consider if transaction A had modified/inserted a lot more rows. But then B only inserts a subset of those. How do you make only those rows visible (that B insert/updated) but not the ones that B didn't 'touch?' Granted, it's a hard problems, and in pure transactional database land, it might just be impossible. And, "on duplicate key update" is a mysql-ism that throws a kink in the whole works.

Ok, now that I've told you it is impossible, I will tell you how to make it work. ;)

There is a workaround. Remember how I said that the update is a no-op if nothing is actually updated? Well, what if we actually FORCE some kind of update to happen. In this case, I added a 'dummy' insert_count column, starting at zero. I then changed it to "on duplicate key update id=LAST_INSERT_ID(id), insert_count = insert_count + 1". This forces an update to occur. The data manipulation is recognized, and the row becomes visible in the transaction. It occurred to me soon after that, that I should probably be doing hibernate 'versioning' on the objects with optimistic locking anyway, and maybe that would play nicely with this.

Wednesday, July 9, 2008

Oh Noes -- My Mac is infected!

... with a Windows virus, or with Windows?

While running skype on my macbook pro, I got this lovely little instant message...

WINDOWS REQUIRES IMMEDIATE ATTENTION
=============================

ATTENTION ! Security Center has detected
malware on your computer !

Affected Software:

Microsoft Windows Vista
Microsoft Windows XP
Microsoft Windows 2000
Microsoft Windows NT Server 4.0
Microsoft Windows Win98
Microsoft Windows Server 2003

Impact of Vulnerability: Remote Code Execution / Virus Infection /
Unexpected shutdowns

Recommendation: Users running vulnerable version should install a repair
utility immediately

Your system IS affected, download the patch from the address below !
Failure to do so may result in severe computer malfunction.

[url removed]
I find that just absolutely hillarious. What a grand scheme -- spam random skype addresses with some windows virus message. For what it is worth, I also get the same kind of random windows specific virus warnings when visiting certain websites.

Good thing I'm a smug mac bastard and don't have to worry about such things (for now).

Saturday, July 5, 2008

Watch out for the "Texas School for the Blind" van

The title may at first appear to be a joke. But as we were leaving a July 4th fireworks show last night, a white van cut us off in the maylay of traffic that clearly everyone was having to deal with while leaving the park.Sure, people cut you off. It happens. No need to get angry.

Then -- I saw it. The van had a giant Texas School for the Blind placard on the side.First some minor chuckles. Then a real 'lol' moment. The irony of being cut off by *this* specific van was too much. And yes, hah hah -- I get that a blind person was not driving. It's still funny. Boo-hiss on you.

Then the real question settled in -- not to be insensitive, but what were they doing at a fireworks display anyway... listening to them go off?

Genius Bar: a.k.a. Mac'tard Central

I succumbed to the infamous burnt power supply issue with my MacBook Pro. After two years of solid use and abuse, it finally gave up the ghost.


I'm still covered under AppleCare pro support, and as this is a known issue -- I figure getting a new adaptor is a no-brainer. A quick call to the AppleCare support line, and the nice gentleman on the other end agrees. We make an appointment for the next day at the local apple store. In the mean time, I head to Fry's to pick up a new supply, as this is my work machine. My battery is also crappy and lasts less than an hour, and I've got plenty more work to do than that.

The next day, I head to my "Genius Bar" appointment around lunch time. Right on time, my name is called. This is the last thing the genius does well.

First, he has no idea why I'm there. Second, he has no idea I'm currently under AppleCare pro coverage. The phone support representative said both of things would be "in the system" and I'd be taken care of right away. There are three possibilities here. Either the genius didn't check the system to see why I was there, or the phone support person did not put the info in the system, OR... there is no system at all.

Second, I have to convince the mac'tard -- oops, genius -- that I'm really covered by apple care. I brought my laptop just in case, but I also had my original apple care coverage certificate. The mac'tard says "duhhh, errrr... I'm not sure it has the info I need to look up your information. Can I just get the serial number from your computer?" Look, mac'tard -- right THERE (I show him) -- that IS the serial number. "duhhh, errrrr --- let me enter it in and see if it works." Needless to say, it was the correct serial number. The astonished "genius" then declares me valid and legal.

Now, as if I was not having enough fun already...

I finally get the requisite "so, what seems to be the problem." Eyes, rolling, I said "'scuse me, while I whip this out" (boy, I wish I really did say that). I show him my obviously charred power adaptor. The genius is perplexed. He turns it over, and around, and flexes the cable, smells it, etc. I'm surprised he didn't stick his tongue on the burnt part to taste it.

After several minutes of deep contemplation, the answer comes: "I believe your adaptor shorted out." WOW. I bow and kneel in the awesome presence of the obviously superior mac-intellect of the certified Genius standing before me. Clearly, I am not worthy and should run from the store screaming giddy joy like a little school girl.

Due to my obviously lesser mac-intellect, I now get the obligatory/deragotory solution that these geniuses must be trained to dole out. "Sir, we will replace your adaptor THIS time... but let me show you how to wrap your cord so it doesn't happen again. See, you need to loop the thin cord before wrapping it around the posts..." I stop him mid-sentence (and yes, this part is true). I pull out my brand-new adaptor (purchased the night before) from my laptop bag. It has the very stress-reducing loop of which he was so graciously attempting to inform me. I actually say "I'm not retarded. I've done it this way for TWO YEARS." [blank stare -- both directions] "uhhhh, ok. let me get that part for you."

Finally convinced of his mac'tardedness, the genius fetches a new power brick from the back. Sign here, date that. And, I'm done.

So, I get that the genius bar probably spends 99% of the time dealing with regular idiots. I mean, we are talking about the apple store here. But I'm quite certain, when I show you a burnt, charred power adaptor -- you should just fetch me a new one.

At the end of the day, I now have two power supplies -- which actually makes life a tad easier for this nomadic worker. My primary work-spot at home now has a permanent adaptor. Setting up and leaving that spot is slightly less of a burden.

Apple support sent me a follow-up email, asking me to rank the "experience." I'm definitely going to reply with the appropriate negative feedback, but boy am I tempted to just link to this post...

Wednesday, June 18, 2008

Hibernate: Cache Queries the Natural Id Way

I work on the MySQL Enterprise Tools team, formerly of MySQL and now with Sun Microsystems. The 2.0 version of the Enterprise Monitor is well under way. As part of this, the Java server backend has been refactored to utilize Spring and Hibernate. Honestly, I didn't know either one of those technologies before starting this project. Oh, what a fun road it has been...

A big draw for using an off-the-shelf ORM was so that we didn't have to write our own (kind of bad and slightly wrong -- those darn transactions) caching implementations for the custom one-off ORM that existed previously. A lot of our internal meta-model is very static, so clearly caching would be a HUGE win for performance, right?

Not so fast, turbo. Let me continue...

The headline feature for the 2.0 Monitor is "Query Analysis." Coupled with the MySQL Proxy, the Monitor receives captured query data to/from a MySQL server. Once at the monitor, the data can be aggregated, analyzed, and reported upon. What better test for this feature than to use it on ourselves, to tune ourselves!

And this brings me back to hibernate caching. In the course of monitoring ourselves, I noticed that a certain query was happening WAY more than it should be, just based on my gut feeling. The query in question loaded an object that was generally static -- save for one value that represents the frequency of how often some data should be collected. Its the only mutable value, and once in place, it rarely changes.

Hrm... how to debug. First, we checked the cache settings. Whoops -- WAY too low for both the cache expiry timeout and the max cache elements. Fix that. Still sucks. Some cursory hibernate source and log perusal showed that the cache for these objects was being invalidated at a rapid rate. Yes, the entire cache. Even though the objects are essentially static, the query cache takes the safe route and says any change to a related table invalidates any and all queries that reference that table. This makes sense, from a generic cache standpoint. But I thought to myself -- surely there has to be a way. *I* am smarter than hibernate in this case, and *I* can more rightly determine when the query results should be invalidated. Lucky for me, hibernate allows you to extend the StandardQueryCache "up to date" policy checks. w00t. I implemented one, overrode the timeout policy for the object(s) in question, and re-ran tests. FAILURE. Turns out I am not smarter than hibernate.

However, in the process of implementing the custom query cache policy, I had debugged through some more hibernate code and noticed that "natural id" queries are treated special. Some more google-fu, and quickly I come across Queries by Natural Identifier in the hibernate docs.

Now, the docs just aren't real clear on what optimizations can be made internally by utilizing the Criteria with a natural id restriction. But, as I was just in that section of code, I could correlate it. Here's the meaty bit -- if you make a natural id / key lookup, and hibernate recognizes it as such, it can bypass the table timestamp invalidation and go directly to second-level cache to fetch the object. Hibernate knows, with an immutable and unique natural key, that a table modification will not effect the composition of the object in question (of course, an object modification would, and it would have been evicted from L2 cache).

I cannot overemphasize the utility of this discovery. You see, we were making frequent inserts into the table. But existing objects (rows) where changed almost never. But without the natural key lookup, the inserts invalidated all results in the query cache. There you have why I was seeing way more selects for the same objects than I had anticipated.

Some quick assurances that we mapped the natural id correctly, some quick refactoring of the HQL into Criteria queries with natural id restrictions, and whammo, we're good. Lets run the tests and query analysis again... ruh roh. OH COME ON! (not my exact reaction, but I think you can guess what it was really like).

Confident that the natural id cache lookup optimization was what I really really really wanted, there had to be something else going on. More debuggage ensued. I set a breakpoint near the same area in StandardQueryCache where I first noticed the query cache optimization in the first place. Lo and behold, the hibernate metadata for saying "i am a natural key lookup" was returning false.

I am not amused. I am confident my hibernate mapping is correct, because the unique index was present in the schema. Think. Think. Think. Well, I had recently been on an effort to move from the hbm xml mappings to hibernate annotations mappings. @NaturalId support was, in fact, the very reason I had recently upgraded the annotations jar. On a hunch, I reverted the persistence mapping back to the xml form. Debug, the metadata returns the correct value... test, and YES, finally -- the queries issued are in line with my expectations and the rows present in the database. I. Have. Won.

Being the good open-source citizen, I made a hibernate forum post that detailed my findings, including simplified sample code demonstrating the problem. The good folks on the hibernate forum (after questioning the sillyness of my contrived example) were quick to recognize the problem, and I got a hibernate jira issue opened.

The workaround, obviously, is leave the xml mapping in place until the fix makes it into a hibernate release. Not too bad of a deal, I guess, considering the overall win I now have in my cache hit ratio.

In conclusion -- if it makes sense for your data model, the natural id query cache optimization can be a huge performance win for your app. If you have immutable, or rarely changed objects with a constant natural key lookup -- look into the Criteria natural id restriction. And, use the xml mapping until the bug is fixed.

ps -- there is one other performance note to consider, actually. If using the natural id query, and it returns no rows, this NULL result will not be cached. So, if you have more of these than 'object/row found' results, you will still get tons of these queries that you don't expect. Either stop using the natural key optimization (if 'not found' is more common), or extend your object/schema to include a 'not supported' field. In our case, the lack of a row meant "not supported" and we had a flag "not supported" in case it was supported, but then went away. In those cases where something was frequently "not supported" I simply went ahead and created the object/row and just set the flag to false -- thus ensuring the natural key optimization was not subverted.


Free Light Show

A few weeks ago some really bad storms came through central Texas. Since I watch TV via DVR, I had no idea there was a storm coming, much less the severity of them. I didn't find out until I came back to my computer, and several friends had IMd me "it's headed right for you!" and "are you in your basement" (to which, in Texas -- no one has basements).

I flipped on the news, and the weatherman was speaking of some obvious rotation "two miles north/northeast of pflugerville." Yeah, uhh... that's right about where I live. In fact, the weatherman's map showed it probably went about a mile north of my house.

But, strangely enough -- no damage. The rest of Austin had plenty, but we survived without a scratch. Perhaps, because we have no trees at all in our neighborhood...

I captured some lo-res video of the lightning. Give it a bit at the beginning, it's not easy to capture lightning video, you know.


Tuesday, June 17, 2008

Dueling Shared Libraries

I'm famous! Well, in my own mind. I've made it onto Jan Kneschke's blog, as we tackled an ugly problem on Mac OS X when trying to use a self-built MySQL Monitor Agent.

I won't plagerize, so go read it yourself.

JUnit Eats Your Memory

Once again, we started getting OutOfMemoryError's in our JUnit test suite. Some necessary refactoring re-introduced me to an old foe that I had battled once before. I figured, what the hell -- an excellent opportunity for my first blog post.

There's an important FAQ in JUnit that should be bold, blinking, italics, and highlighted. The gist is -- if you assign an instance member in a JUnit TestCase (declaration, setUp(), or otherwise), you damn well better assign it to null in tearDown().

You see, JUnit creates an instance of the TestCase class for EVERY test method. It would have you believe, the way it is architected, that the TestCase class in question is instantiated once, every test method is called with setUp() and tearDown() pre- and post-, and all is glorious. But that is not the case. Every test method gets its own test case instance. Oh -- and the important detail -- ALL the classes have references held on them until the END of the test suite run.

So, if you do any sort of non-trivial testing with even moderately large (but transient) objects, be sure to set the refs in your testcase to null in tearDown() so they can be garbage collected.

Better yet -- lets find a newer tool than JUnit. Just because it is ubiquitous doesn't mean it is the best tool.