Tuesday, April 14, 2009

Arbitrary Replication/Proxy Topologies with MySQL Sandbox

How would you like to use MySQL Sandbox to deploy and test any replication topology imaginable? Do you want to describe load balancer and proxy configurations, too? If you already love sandbox (or even if you don't), and you want to play with arbitrarily complex replication/proxy topologies... read on.

Sandbox Is Your Friend


MySQL Sandbox is a great tool. As of this writing, version 3 is feature complete -- and the new features are greatly desired. Replication support has been there a while for master/slave setups with as many slaves as you want and for replication rings of any size. Version 3 adds multi-level replication trees on a multi-sandbox using sbtool.

But what if the provided replication setups do not quite match what you need? Granted, I'll argue they are sufficient for 99% of setups. But there's always that 1%... Then, throw MySQL Proxy in the mix and things really start get interesting. Sandbox (at the moment) supports very limited proxy configuration.

It's Just a Directed Graph


Giuseppe and I had some conversations a while back on ways to describe replication topologies. In its purest form, a replication topology is a directed graph with some limitations imposed by MySQL replication technology itself (a slave cannot have two master). One can either think of it as Master->Slave (conceptual data flow direction), or Slave->Master (connection initiation direction). I choose to think of it as Master->Slave, because for my wee bitty brain it helps to think in terms of the event flow.

After a modicum of research, I realized the Graphviz DOT language seems very well suited to describing graph/network topologies. Sure, the intention of dot is mostly for visualization. But the language itself is generic enough to describe graphs, nodes, edges, and any properties you want on any of those components. Perfect.

Enter Sandbox

So, we have a generic graph description language. We have a tool that can deploy nodes and configure them. Sandbox is in perl. There exists Graph::Reader::Dot in CPAN. We. Are. Golden. In short order, I've got the latest MySQL::Sandbox and Graph::Reader::Dot installed into my perl environment. It doesn't take long to start reading in dot files and deploying mysqls with sandbox. The rest is just SMOP™ -- a Simple Matter of Programming™.

Enough with the History Lesson

The perl code is here: make_sandbox_topology. Get the perl prerequisites from CPAN, MySQL::Sandbox and Graph::Reader::Dot. Then in typical sandbox fashion, you invoke it as

./make_sandbox_topology { VERSION | tarball } topology.dot

Some example dot topologies follow now...

Master/Slave
digraph MasterSlave
{
Master->Slave;
}

Master, Slaves

digraph MasterSlaves
{
Master->Slave1;
Master->Slave2;
}
... and repeat with as many slaves as you want.

Tree
digraph Tree
{
A->B;
A->C;
B->D;
B->E;
C->F;
C->G;
D->H;
D->I;
E->J;
E->K;
F->L;
F->M;
G->N;
G->O;
}

Master/Master

digraph MasterMaster
{
A->B->A;
}

Ring

digraph Ring
{
A->B->C->D->A;
}

Ring With Slaves

digraph Ethanol
{
C1->C2->C1;
C1->H11;
C1->H12;
C1->H13;
C2->H21;
C2->H22;
C2->O->H;
}

digraph Benzene
{
C1->C2->C3->C4->C5->C6->C1;
C1->H1;
C2->H2;
C3->H3;
C4->H4;
C5->H5;
C6->H6;
}

Ok. Enough sillyness. I think you get the point.

Let There Be Proxy

Proxies are just another node in the topology. But, we have to identify them to sandbox as a different type, so they are not deployed as mysql instances. In dot notation, this becomes Proxy [class=proxy]. How about a simple load balanced setup with two slaves.
digraph LoadBalanced
{
Master->SlaveA;
Master->SlaveB;

LoadBalancer [class=proxy];
LoadBalancer->SlaveA;
LoadBalancer->SlaveB;
}

The LoadBalancer when deployed and started, is now configured for the two slave backends. Notice, however, that there is no lua goodness for the proxy (yet)....

Let's do one more with a combination of load balancing and pure proxy.

digraph ProxyExample
{
Master->SlaveA;
Master->SlaveB;
Master->StandBy;

LoadBalancer [class=proxy];
LoadBalancer->SlaveA;
LoadBalancer->SlaveB;

ProxyM [class=proxy];
ProxyM->Master;

ProxyA [class=proxy];
ProxyB [class=proxy];

ProxyA->SlaveA;
ProxyB->SlaveB;
}
In this case, SlaveA and SlaveB are load balanced, and every other mysql node has a proxy in front him (for who knows what, query analysis perhaps?) except for the standby slave.

Node Configuration

The previous section introduced the first concept of a node's configuration. A node's class defaults to mysql, but can be specified as proxy. All other attribute=value pairs are passed as configuration to the deployed type.

Topology Level
digraph Simple
{
baseport=22000;
Master->Slave;
}

Currently, the only 'topology' level supported attribute is baseport. It, as you might guess, defines the starting port for port assignment via sandbox.

MySQL Class

Any argument that is valid for a mysqld command line or my.cnf configuration file is fair game for a mysql class node configuration. Want to hard code the port? Want to make a slave read-only? No problem.

Slave [port=9999, read_only=1]

Now the slave will be deployed via sandbox with the given port 9999 and read_only in the my.sandbox.cnf

Proxy Class

Any argument that is valid for mysql-proxy command line or ini file setting may be set as node configuration. The class=proxy attribute is not forwarded, as this is a recognized 'internal' node config option.

Proxy [class=proxy, "log-level"=debug, "proxy.proxy-lua-script"="share/mysql-proxy/lb.lua"]

Notice, the quotes around some of the keys and values is due to dot grammar limitations.

Default Node Attributes

Obviously, a lot of configuration could end up being repeated. And that sort of defeats the purpose of a lot of the automation that sandbox provides. Dot provides a default node attribute concept that we can build on.

node ["proxy.log-level"=message, "proxy.proxy-lua-script" = "script.lua",
"mysql.read_only"=1, "mysql.default-character-set"=utf8 ]

"node" is literally n o d e. That is the dot syntax for "every node should contain these attributes." Every proxy will have log-level as message and the script.lua lua script, unless the node definition itself overrides that same attribute. Likewise for mysql nodes. Every one would be read only and utf8, unless the node itself is explicit about those attributes.
Since the options are really passed to ALL nodes, I had to scope the attributes to what classes they really belong to. To me, it makes it slightly more readable at a 'default attribute' level anyway, but one could envision being smarter about what attributes belong to what classes, and doing some automagic scoping for you. The only problem then would be collisions -- which could be resolved with explicit scoping in those cases.

Future

I've already joined sandbox-developers on launchpad. Next step is to work with the team and get it on the blueprints, and then into sandbox release.

Once on launchpad, we'll start soliciting user ideas for features.

Issues and Limitations
  • Dot language. Attribute key and value grammar is pretty limited, so remember to quote them. I don't catch parse errors and die (yet), so if you get funky errors later remember to check your dot syntax.
  • Only valid replication topologies are allowed. If you try to configure two masters for one slave, we'll catch it. Wait, this is actually a feature, not a limitation...
  • master/slave binlog events not proxied, so it is caught and disallowed. Maybe proxy will proxy binlogs... soon?
  • Currently it does not honor all sandbox options (like user and password, or port conflict detection and resolution)
  • the Blogger editor REFUSES to maintain proper whitespace in my dot examples. Ideally, I like to use 4-space indent. But it keeps removing them for me, so I'm tired of adding them back in. sorry.
Please remember, this is just a rough cut at the concept. Architecture and design could use some serious refactoring. But, it's out there to use and abuse -- and we'll see about getting into MySQL::Sandbox proper.

If desired, I can followup in further blogs with actual usage examples and walk-throughs. Perhaps the best place for documentation though is on mysql forge.