capricious diatribes: MySQL Query Analyzer: A Peek Under the Hood

I had a hard time coming up with a title for this blog post. I really wanted to capture its essence. Several discarded titles include, but are not limited to:

How It Works
The Gory Details
Everything You Never Wanted to Know
The Insider's Perspective

I picked the lamest one, hoping that it is catchy enough to ensnare the unfortunate soul that comes across my blog. Just in case you don't know by now, I work for MySQL (ok, Sun) on the Enterprise Tools team. We're approaching (some sort of) release of MySQL Enterprise Monitor 2.0, with the headline feature Query Analysis.

What I want to achieve with this article is to give definitions, data, flow, etc., that describe how we look at things internally in the architecture of the application. This is intended to give the hapless reader an idea of what Query Analysis can do, and consequently what it can not.

What is a query?

I actually call them statements inside the application. To me, query seemed to imply asking about something. And we track more than that. Is an INSERT or UPDATE a query? Is a CREATE TABLE a query? Not in my feeble mind. So, I chose statement because it is generic enough to encompass them all, yet seems still to have ties to database vernacular. Too bad "statement analysis" doesn't quite have that sexy marketing appeal.

A unique statement/query is an application's query that is normalized and executes against a given database. By normalized, I mean all literals are replaced with '?', (some) comments are stripped, the case of SQL itself is made capitalized, and extraneous whitespace is collapsed.

There is a subtle phrase in there that is very important -- "a unique statement ... executes against a given database." Basically, we identify a statement key as {database, normalized text}. That means if you have statement SELECT 1 and execute it in the context of database 'test', switch databases to webapp_foo, and execute it again, in the mind of MySQL Query Analyzer it is a different statement.

This works well for things like replication scale out, where a slave farm will have bunches of queries executing against the same database(s). It breaks down if, somehow, your application uses multiple databases, but yet you really want the queries to be treated the same for aggregation. We haven't found this as the typical use case (yet), but if we get feedback we can revisit this in future releases.

For a statement that executes against a MySQL server instance, I call that a 'statement summary.' Because it is at this level that we actually track and summarize execution statistics. More on this later...

Where does the data come from?

Enter the MySQL Proxy, with some enterprise-only plugins and lua scripts. The proxy sits between your application and your MySQL database server. All queries passing through it are captured, normalized, and stored until they are reported back to the Monitor server. The infamous agent is now a plugin that shares a process inside the chassis, along side the proxy. Proxy and agent communicate, in this case via lua, in order to push query data to the agent plugin so the agent can send it up to the monitor in the same fashion as any other data it collects.

There are potentially other sources of data... but that's a story for a different day.

What data is tracked?

For a given statement, we track:

number of executions
execution time
min/max execution time(s)
rows returned/affected
min/max row(s)
result set size
min/max result set size

The normalized statement is tracked by the proxy. Each statement that fits in that normalized bucket is added to the ongoing statistics. The Monitor backend periodically asks for a snapshot (currently every minute), at which point the proxy resets his statistical aggregation (think, "read-clear" semantics).

Also, some extra metadata is kept with the query, for housekeeping reasons:

database (which can actually be null - not every query executes in a database context)
normalized query text
a hash of the text
query type (select, insert, update, etc.)

What does the agent/proxy actually send to the Monitor?

Actually? You don't want to know. When you get your hands on it, you can turn up the agent log level to debug and see for yourself. It's not like it's a secret. It's just not worth mentioning here.

Conceptually? Now there's a better topic... Although we don't take full advantage of it in all agent communications, we've moved to a more RESTful approach to agent interaction. Internally, we take the legacy xml agent protocol and turn that into sexy 'representations' that we put to internal REST APIs. An example JSON representation for the query data looks something like this:

PUT /instance/mysql/statementsummary/f8ccc1d2-f63b-4c30-9d42-1d383e626510.em2.02a6ed17a7627d1777f33a6062826951
{
"name": "f8ccc1d2-f63b-4c30-9d42-1d383e626510.em2.02a6ed17a7627d1777f33a6062826951",
"values":     {
  "bytes": 325,
  "count": 8,
  "database": "em2",
  "exec_time": 3174,
  "max_bytes": 45,
  "max_exec_time": 613,
  "max_rows": 1,
  "min_bytes": 38,
  "min_exec_time": 394,
  "min_rows": 1,
  "query_type": "SELECT",
  "rows": 8,
  "text": "SELECT hibtag0_ . tag_id AS tag1_18_ , hibtag0_ . tag AS tag18_ FROM tags hibtag0_ WHERE hibtag0_ . tag = ?",
  "text_hash": "02a6ed17a7627d1777f33a6062826951"
 }
}

This is our conceptual representation of a statement summary (remember, a statement summary is a summarized statement execution on a single mysql instance). Statements themselves actually do not exist on the proxy/agent side or in the communications, since they are always tracking a mysql instance. The statement itself is purely a Monitor-server side concept that we track for rollup purposes. The UUID you see in the RESTy URI is the mysql.inventory unique identifier, something we have to introduce for 2.0 to identify, without any uncertainty, a specific mysql instance we are monitoring. Then, it has the database name and the text hash. The uuid.db.text_hash defines the key for this statement summary.

What does the Monitor do with the statement summaries?

From the statement summary, we pull out the pieces which define a statement, which happen to be immutable. They are the database, query text, text hash, and query type. After storing that, we create a statement summary, which basically just ties the statement to a server instance. We also store the creation time of this object/row, so we can track the first time this query was ever seen on a server (or, ever seen period when you take the min and group by the statement). Finally, we take the ongoing snapshot data and store that.

Tell me how the aggregation works.

Aggregation works on two levels. Because we split out immutable statement data, and statement summary (per server) data, we can then aggregate on each of them.

Lets revisit what these snapshots mean. Each statement summary data instance is a snapshot of the summarized statistics for the normalized query over the time period since the last time it was reported. This gives us the powerful ability to then aggregate these snapshots over an arbitrary time range chosen by the end user.

Consider the single server aggregation case. If count is the number of executions in a given minute, and we store this count snapshot over many minutes, then the total count for an arbitrary time range is merely the sum of the counts (to the minimum granularity of our collection frequency, which is currently one minute). Rows and result set size aggregation work exactly the same way.

The mins and maxes aren't magical either. The mins/maxes per snapshot are for that one minute range. If you want the mins/maxes over the larger requested range, just take the min of mins and maxes of the maxes.

There is one "trick" on the monitor-side to show averages over the arbitrary range. We show average execution time, average rows returned/affected, and average result set size. What does an 'average' mean in this case? We want an average per query execution (not per unit time). Average execution time would be the total execution time divided by the total executions. Well, we aggregate execution time, and we aggregate count... so we just take those columns in the query we build, and write an expression column of time/count. There's your average. Same for rows and bytes -- just take the aggregated totals and divide by the count.

Now, what about summaries across multiple servers? Since we have the immutable statement data broken out, its just another aggregation on top of the server one, grouped by the statement identity. And the same sum, min, max, and average tricks still apply.

Can I see statistics per user?

Not at this time. The user stats might be pretty easy to tack on in the next release. It would just become another element to the {database, query} key. Then we could still selectively aggregate at the user level, the server level, or the statement level, or any combination.

Sounds neat! When can I see it?

Currently, query analysis with Monitor 2.0 is in limited beta to select enterprise customers. Email beta-enterprise@mysql.com for more information. There is also a special MySQL Query Analysis forum. It has some good links, and as people get their hands on it, I bet it becomes the central place to go for all things Query Analysis.

What does Query Analysis in MySQL Enterprise Monitor 2.0 look like?

Check out our product manager's blog. He's better at doing touchy feely marketing flap like screen shots. ;)

Conclusion

Here, I covered all the gory details from what is the data we collect, where we collect it from, and how we conceptually view and aggregate it on our backend. I'd like to touch on how the UI interacts with the data -- specifically group views, server views, sorting, searching, and aliasing, but I think this article has hit capacity. Perhaps a blog for another day...

9 comments:

Sheeri K. Cabral said...: Do you have any stats about memory usage? We used proxy in production and had a program automatically restart proxy when it used more than 512 Mb; the lua script was a failover/failback script, used with 18 persistent connections (so never more than that active -- we had a proxy instance live on each webserver, so the webservers could just connect to localhost, and so resource usage would be spread across many machines). Memory usage fluctuated wildly, often it was under 256 Mb, and the characteristics did not speak to a memory leak (it would go up to, say, 400 Mb, then drop down, etc)....but once a day or more the memory usage was high enough to need to restart proxy. 1/2 Gb of memory was too much usage for a proxy, so we got rid of it.; September 3, 2008 at 8:07 AM
Oldag said...: I, unfortunately, do not. I work on the centralized monitor server and not the proxy specifically.

However, according to our proxy devs, this memory is way out of line with what we expect. Could you open a bug report at bugs.mysql.com?; September 3, 2008 at 8:43 AM
Anonymous said...: Examining the connection between variety of playing formats engaged in regularly and money spent prior to now yr on playing, the connection was weak (− 0.20, 95% CI, -0.22, − 0.17). The relationship between frequency of playing prior to now yr and involvement was moderate (0.forty, 95% CI, 0.38–0.43). While not sturdy, these outcomes suggest that playing involvement is positively related to playing depth. Figure3 illustrates the overall percentage of people 메리트카지노 experiencing a playing drawback and those not experiencing a playing drawback across variety of playing formats. Among people not experiencing a playing drawback, 45% gamble on only one format, while amongst people experiencing a playing drawback, 16% gamble on only one format. Among people not experiencing a playing drawback, because the variety of monthly playing formats increases, the proportion decreases, with less than 8% taking part in four or extra playing formats on a monthly basis.; November 17, 2022 at 7:53 PM
Anonymous said...: The state looked to trim debt by introducing mortgage curbs in July, and the central financial institution raised rates of interest in August, the first main economy to do so during the pandemic. By the top of March, South 바카라사이트 Koreans of their twenties and thirties held a report $395 billion in loans, a 14% improve from the identical time final yr, based on the Bank of Korea. Those of their thirties have probably the most publicity, with their debt equal to roughly 270% of their annual earnings. This determine isn’t far off from different OECD countries—Canadian millennials as an example, have a 216% debt-to-income ratio—but young Koreans’ debt has exploded latest years|in recent times|lately}.; December 5, 2022 at 3:25 AM
Anonymous said...: To use a CNC machine for furniture making, the first step is to create a design. This could be done using computer-aided design software or by manually drawing the design on paper. Once the design is full, it's saved as a digital file and uploaded to the CNC machine. The machine will then observe the instructions within the file, cutting or shaping the fabric as instructed. After the job is finished, the operator will often unload the fabric and take away any debris that Baby Lounger Bed was generated during the cutting or shaping process.; December 8, 2022 at 7:45 AM
Digital said...: This article on "MySQL Query Analyzer: A Peek Under the Hood" is a fantastic resource for database enthusiasts! Your detailed exploration of how the MySQL Query Analyzer works provides valuable insights into optimizing SQL queries for better performance. I appreciate the clear explanations of the various metrics and analysis techniques used to evaluate query efficiency. The examples you included really help to illustrate the concepts and make them more relatable. It would be great to see more case studies or real-world scenarios where the Query Analyzer made a significant impact. Overall, a well-written and informative article that is sure to benefit many readers!

Digital Marketing Training In Ameerpet; October 2, 2024 at 4:24 AM
Aussie Discreet said...: A sweet puff water pipe is a compact and stylish smoking accessory designed for a smooth and enjoyable experience. Known for its sleek design and ease of use, it allows smokers to filter and cool smoke through water, reducing harshness. Made from durable materials like glass or silicone, it’s portable and easy to clean.

Whether for personal use or as a collectible, a Sweet Puff water pipe is perfect for those seeking convenience and style. Its efficient design enhances flavor and reduces impurities, making it a popular choice among enthusiasts. Always use responsibly and ensure compliance with local regulations.; January 10, 2025 at 6:12 AM
Nikhil said...: Android training courses deliver structured mobile learning programs. They combine theory with practical exposure. This android training courses improve development expertise. Students work on live applications. Security implementation is explained. Background processing is covered. Debugging techniques are included. It builds job-ready professionals.; February 18, 2026 at 4:07 AM
vr said...: Great insights! Strong data modeling course
skills are key to building efficient databases and driving accurate business decisions.; February 18, 2026 at 5:13 AM

capricious diatribes

Friday, August 29, 2008

MySQL Query Analyzer: A Peek Under the Hood

9 comments:

Useless information about

Labels

Blog Archive